Labeling of Cultural Heritage Collections on the Intersection of Visual Analytics and Digital Humanities
Christofer Meinecke
TL;DR
This paper examines the challenges of applying visual analytics and machine learning to cultural heritage collections, with a focus on labeling, data quality, and the underexplored realm of intangible heritage. It argues that successful interdisciplinary work relies on participatory design and human-in-the-loop labeling, and it presents three case studies—Interactive Text Edition Alignment, Visualizing Entities in Medieval Manuscripts, and Hierarchical Classification for Medieval Illuminations—to illustrate data problems and concrete design takeaways. The findings emphasize issues such as limited data, lack of ground truth, vocabulary drift across institutions, and the need for lightweight, transparent labeling workflows and domain-specific vocabularies, sometimes leveraging weak supervision. Collectively, the work offers practical guidance for visualization scholars and outlines directions to strengthen GLAM collaborations at the intersection of digital humanities and visual analytics, including multi-label approaches and strategies for incorporating intangible heritage perspectives.
Abstract
Engaging in interdisciplinary projects on the intersection between visualization and humanities research can be a challenging endeavor. Challenges can be finding valuable outcomes for both domains, or how to apply state-of-the-art visual analytics methods like supervised machine learning algorithms. We discuss these challenges when working with cultural heritage data. Further, there is a gap in applying these methods to intangible heritage. To give a reflection on some interdisciplinary projects, we present three case studies focusing on the labeling of cultural heritage collections, the problems and challenges with the data, the participatory design process, and takeaways for the visualization scholars from these collaborations.
