The Role of Generative Systems in Historical Photography Management: A Case Study on Catalan Archives
Èric Śanchez, Adrià Molina, Oriol Ramos Terrades
TL;DR
The study examines how generative systems influence automated captioning for historical Catalan archives, addressing language bias and historical-domain shift. It deploys the compact CATR captioning framework and evaluates the contributions of image generation and text generation using synthetic data and multilingual pretraining. Key findings show that natural images with translated captions outperform synthetic-data-only strategies, while language proximity and data scale significantly shape performance; synthetic images offer limited gains and can introduce noise. The work provides practical guidance for heritage institutions on transfer-learning configurations and highlights the need for domain-adaptation methods to responsibly apply generative tools in historical, multilingual contexts.
Abstract
The use of image analysis in automated photography management is an increasing trend in heritage institutions. Such tools alleviate the human cost associated with the manual and expensive annotation of new data sources while facilitating fast access to the citizenship through online indexes and search engines. However, available tagging and description tools are usually designed around modern photographs in English, neglecting historical corpora in minoritized languages, each of which exhibits intrinsic particularities. The primary objective of this research is to study the quantitative contribution of generative systems in the description of historical sources. This is done by contextualizing the task of captioning historical photographs from the Catalan archives as a case study. Our findings provide practitioners with tools and directions on transfer learning for captioning models based on visual adaptation and linguistic proximity.
