Congressional Portal Project

Project Title: Congressional Portal Project

Duration

2022 – “Ongoing”

Institution

Carl Albert Congressional Research and Studies Center Archives

Project Overview

The Congressional Portal Project is a collaborative research initiative to improve accessibility and analysis of congressional archival records. It provides a repository of workflows, instructional materials, controlled vocabularies, and automated tools to enhance large-scale processing of archival text files.

The project integrates Natural Language Processing (NLP) and OpenAI’s GPT models to automate metadata extraction and standardization across collections. This enables fast, scalable access to previously underutilized congressional materials.

Background

  • Partnerships: West Virginia University Libraries, Robert J. Dole Institute of Politics, Robert C. Byrd Center, The Dirksen Congressional Center, University of Hawai’i at Manoa, and the Russell Library.
  • Goal: Build a centralized portal to unify and interpret congressional collections.
  • Focus Area: Emphasizes American Indian sovereignty and policy-related materials.
  • Evolution: Evolved from disparate tools into a streamlined workflow integrating facial recognition, controlled vocabularies, feedback loops, and MPI-based parallel processing.
American Congress Digital Archives Portal
American Congress Digital Archives Portal

Key Tasks

  • Adaptive Learning Models for accuracy in metadata extraction
  • Feedback Loops to improve NLP results and vocabulary evolution
  • Standardization Frameworks for consistent metadata
  • Human-Centered Design models to improve usability

Automating Archival Processes

Machine Learning Algorithms

  • NER: Identify names, locations, and organizations
  • Topic Modeling: Detect major themes
  • Text Classification: Connect documents to pre-defined vocabularies
  • Sentiment Analysis: Understand tone and political leanings
  • Entity Linking: Match terms to authoritative entities

Feedback & Verification

  • Contextual Analysis: Account for figurative language
  • Multi-step Verification: For people, dates, organizations
  • Controlled Vocabularies: Ensure accurate tagging and annotation

Repository Folders


Collections Overview

The project supports more than 75,677 items across four key collections at the CAC Archives and our Digital Archives Platform.

CollectionTypeTopicsSubtopicsSignificanceExtentFormats
Indian Self-DeterminationTopicalCongress as policy-maker, Leaders & partiesCommittee leadership, Constituent commsStrategic tribal and congressional interactions23 collectionsPDF/A, PDF/E, TIFF
Robert L. Owen CollectionWholeCongress, Courts, CultureFederal Indian policyCherokee Nation leader and federal Indian agent199 itemsPDF/A, TIFF
U.S. House Campaign AdsWholeElections, Interest groupsTactics, OutcomesHistoric TV/radio campaign ads24,678 itemsMJPEG 2000, MOV, AVI
Carl Albert PhotosWholeParty leadershipPhotographs throughout Albert’s career11,000 itemsTIFF