Congressional Portal Project
Project Title: Congressional Portal Project
Duration
2022 – “Ongoing”
Institution
Carl Albert Congressional Research and Studies Center Archives
Project Overview
The Congressional Portal Project is a collaborative research initiative to improve accessibility and analysis of congressional archival records. It provides a repository of workflows, instructional materials, controlled vocabularies, and automated tools to enhance large-scale processing of archival text files.
The project integrates Natural Language Processing (NLP) and OpenAI’s GPT models to automate metadata extraction and standardization across collections. This enables fast, scalable access to previously underutilized congressional materials.
Background
- Partnerships: West Virginia University Libraries, Robert J. Dole Institute of Politics, Robert C. Byrd Center, The Dirksen Congressional Center, University of Hawai’i at Manoa, and the Russell Library.
- Goal: Build a centralized portal to unify and interpret congressional collections.
- Focus Area: Emphasizes American Indian sovereignty and policy-related materials.
- Evolution: Evolved from disparate tools into a streamlined workflow integrating facial recognition, controlled vocabularies, feedback loops, and MPI-based parallel processing.

Key Tasks
- Adaptive Learning Models for accuracy in metadata extraction
- Feedback Loops to improve NLP results and vocabulary evolution
- Standardization Frameworks for consistent metadata
- Human-Centered Design models to improve usability
Automating Archival Processes
Machine Learning Algorithms
- NER: Identify names, locations, and organizations
- Topic Modeling: Detect major themes
- Text Classification: Connect documents to pre-defined vocabularies
- Sentiment Analysis: Understand tone and political leanings
- Entity Linking: Match terms to authoritative entities
Feedback & Verification
- Contextual Analysis: Account for figurative language
- Multi-step Verification: For people, dates, organizations
- Controlled Vocabularies: Ensure accurate tagging and annotation
Repository Folders
- documentation-applications-list
Worksheets, indexes, vocabularies, training data - workflows
Executable scripts and batch processes - deprecated-packages
Retired tools and references
Collections Overview
The project supports more than 75,677 items across four key collections at the CAC Archives and our Digital Archives Platform.
Collection | Type | Topics | Subtopics | Significance | Extent | Formats |
---|---|---|---|---|---|---|
Indian Self-Determination | Topical | Congress as policy-maker, Leaders & parties | Committee leadership, Constituent comms | Strategic tribal and congressional interactions | 23 collections | PDF/A, PDF/E, TIFF |
Robert L. Owen Collection | Whole | Congress, Courts, Culture | Federal Indian policy | Cherokee Nation leader and federal Indian agent | 199 items | PDF/A, TIFF |
U.S. House Campaign Ads | Whole | Elections, Interest groups | Tactics, Outcomes | Historic TV/radio campaign ads | 24,678 items | MJPEG 2000, MOV, AVI |
Carl Albert Photos | Whole | Party leadership | Photographs throughout Albert’s career | 11,000 items | TIFF |