Political Campaign Commercial Collection Reprocessing

Project Title: Political Campaign Commercial Collection Reprocessing

Duration

2021 – “Ongoing”

Institution

Carl Albert Congressional Research and Studies Center Archives
(In collaboration with Harvard University and the University of Iowa)

Project Overview

The Carl Albert Center Archives, along with Harvard University and the University of Iowa, was awarded a collaborative research grant from the National Science Foundation (NSF) for the project:
“Understanding the Evolution of Political Campaign Advertisements over the Last Century.”

The project focused on three major aims:

  1. Make a large underutilized collection of over 120,000 political ads (1912–2016) suitable for academic and public research.
  2. Understand the evolution of political advertising, particularly regarding issue advocacy and gender/minority representations before 1996.
  3. Promote interdisciplinary education in audiovisual analysis for graduate and undergraduate researchers.

Our task at the Carl Albert Center focused on aim #1, delivering a cleaned, structured, and accessible dataset for collaborators. This page documents our process and the innovative solutions developed during the reprocessing effort.

Key Research Components

  • Addressed challenges of insufficient documentation, undefined workflows, and limited funding.
  • Developed scalable workflows for managing large AV collections.
  • Ensured long-term access and usability of digital ad materials.
  • Shared practical solutions for academic archival environments.

Workflow Documentation & Resources

Case Study & Collection Background

  • Origin, acquisition, and growth timeline of the ad collection
  • Collection complexity and disorder: the entropy effect

Infrastructure & System Challenges

  • Backlogs due to “internet effect” and constant digital growth
  • Issues with format normalization, file duplication, and legacy media

Initial Tools & Methods

  • Python templates for batch renaming, classification, metadata mapping
  • Control methods: Unified Component ID (P_COPY-OID) system
  • Audio/video data cleansing, aggregation, forensic analysis

Implementation & Phase 1 Workflow

  • Digitization and file intake
  • Batch tagging, error detection, AI-assisted classification
  • Manual QC and correction protocols

Sample Student Access Workflows

  • Group A–B: Basic metadata cleanup
  • Group C: Enhanced tagging & AV sync corrections
  • Group QA-Rover-1: Advanced metadata validation & exception reporting

Research Tools & Appendices

Appendix 1

Building a Python environment (step-by-step setup guide)

Appendix 2

Common workflows for AV transcription

  • 2.1: Creating transcripts & summaries (free tools)
  • 2.2: Accuracy testing – Whisper AI vs. Gensim + NLTK

Temporary Video Viewing Platform

Political Ads Interface
Click image to visit CAC Digital Archives