Auto-Cataloging Research Materials from the Endangered Archives Programme

Event box

Facilitated by

Ann Farnsworth-Alvear, Associate Professor of History, University of Pennsylvania

Andrew Janco, Digital Scholarship Specialist, Princeton University Library

Building on our experience with materials from the British Library’s Endangered Archives Programme and 19th-century court records from the Circuit Court of Istmina, Chocó, Colombia, which are damaged and in different manuscript and typescript formats, we will teach participants how to utilize AI to generate research data from document images. Using vision-language models (VLMs), we will demonstrate how to extract text and other metadata from images and publish this data in accessible formats for research. This transformative process makes it possible to search and analyze collections of digitized documents in ways that facilitate exploratory analysis and computational research methods. Machine-readable text makes documents more accessible to researchers, students, and the public. Additionally, auto-cataloging of the collection provides standardized metadata, such as the occurrences of person names, place names, and case summaries, which facilitates research across the entire collection. VLMs are particularly useful in this context for both handwritten text recognition and the description at the case, box, and collection levels.

Date:: Friday, November 21, 2025
Time:: 3:00pm - 5:00pm
Location:: Commons Library Classroom (D112)
Audience:: Princeton Faculty/Researcher Princeton Student
Categories:: Data & Computation