Political Campaign Commercial Collection Reprocessing
Project Title: Political Campaign Commercial Collection Reprocessing
Duration
2021 – “Ongoing”
Institution
Carl Albert Congressional Research and Studies Center Archives
(In collaboration with Harvard University and the University of Iowa)
Project Overview
The Carl Albert Center Archives, along with Harvard University and the University of Iowa, was awarded a collaborative research grant from the National Science Foundation (NSF) for the project:
“Understanding the Evolution of Political Campaign Advertisements over the Last Century.”
The project focused on three major aims:
- Make a large underutilized collection of over 120,000 political ads (1912–2016) suitable for academic and public research.
- Understand the evolution of political advertising, particularly regarding issue advocacy and gender/minority representations before 1996.
- Promote interdisciplinary education in audiovisual analysis for graduate and undergraduate researchers.
Our task at the Carl Albert Center focused on aim #1, delivering a cleaned, structured, and accessible dataset for collaborators. This page documents our process and the innovative solutions developed during the reprocessing effort.
Key Research Components
- Addressed challenges of insufficient documentation, undefined workflows, and limited funding.
- Developed scalable workflows for managing large AV collections.
- Ensured long-term access and usability of digital ad materials.
- Shared practical solutions for academic archival environments.
Workflow Documentation & Resources
Case Study & Collection Background
- Origin, acquisition, and growth timeline of the ad collection
- Collection complexity and disorder: the entropy effect
Infrastructure & System Challenges
- Backlogs due to “internet effect” and constant digital growth
- Issues with format normalization, file duplication, and legacy media
Initial Tools & Methods
- Python templates for batch renaming, classification, metadata mapping
- Control methods: Unified Component ID (P_COPY-OID) system
- Audio/video data cleansing, aggregation, forensic analysis
Implementation & Phase 1 Workflow
- Digitization and file intake
- Batch tagging, error detection, AI-assisted classification
- Manual QC and correction protocols
Sample Student Access Workflows
- Group A–B: Basic metadata cleanup
- Group C: Enhanced tagging & AV sync corrections
- Group QA-Rover-1: Advanced metadata validation & exception reporting
Research Tools & Appendices
Appendix 1
Building a Python environment (step-by-step setup guide)
Appendix 2
Common workflows for AV transcription
- 2.1: Creating transcripts & summaries (free tools)
- 2.2: Accuracy testing – Whisper AI vs. Gensim + NLTK
Temporary Video Viewing Platform
