Table of Contents
Fetching ...

Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections using Visual Analytics

Christofer Meinecke, Estelle Guéville, David Joseph Wrisley, Stefan Jänicke

TL;DR

Legacy cultural heritage image collections suffer from non-homogeneous metadata and vocabulary drift, hindering retrieval and machine learning applicability. The paper presents a participatory visual analytics system that unifies vocabularies from Mandragore and Initiale, builds a high-quality label hierarchy, and supports distant viewing through embeddings and interactive labeling. Key contributions include a multi-layered visual analytics platform, a multi-view re-annotation environment, and a co-constructed label hierarchy that bridges legacy knowledge bases for cross-dataset discovery. The approach offers a generalizable framework for enriching legacy collections with ML-ready metadata, enabling improved discoverability and downstream tasks across cultural heritage corpora.

Abstract

Distant viewing approaches have typically used image datasets close to the contemporary image data used to train machine learning models. To work with images from other historical periods requires expert annotated data, and the quality of labels is crucial for the quality of results. Especially when working with cultural heritage collections that contain myriad uncertainties, annotating data, or re-annotating, legacy data is an arduous task. In this paper, we describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. Since a manual reconciliation of the two legacy ontologies would be very expensive, we aim (1) to create a more uniform set of descriptive labels to serve as a "bridge" in the combined dataset, and (2) to establish a high quality hierarchical classification that can be used as a valuable input for subsequent supervised machine learning. To achieve these goals, we developed visualization and interaction mechanisms, enabling medievalists to combine, regularize and extend the vocabulary used to describe these, and other cognate, image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets, enable batch re-annotation of images, recommendation of label candidates and support composing a hierarchical classification of labels.

Is Medieval Distant Viewing Possible? : Extending and Enriching Annotation of Legacy Image Collections using Visual Analytics

TL;DR

Legacy cultural heritage image collections suffer from non-homogeneous metadata and vocabulary drift, hindering retrieval and machine learning applicability. The paper presents a participatory visual analytics system that unifies vocabularies from Mandragore and Initiale, builds a high-quality label hierarchy, and supports distant viewing through embeddings and interactive labeling. Key contributions include a multi-layered visual analytics platform, a multi-view re-annotation environment, and a co-constructed label hierarchy that bridges legacy knowledge bases for cross-dataset discovery. The approach offers a generalizable framework for enriching legacy collections with ML-ready metadata, enabling improved discoverability and downstream tasks across cultural heritage corpora.

Abstract

Distant viewing approaches have typically used image datasets close to the contemporary image data used to train machine learning models. To work with images from other historical periods requires expert annotated data, and the quality of labels is crucial for the quality of results. Especially when working with cultural heritage collections that contain myriad uncertainties, annotating data, or re-annotating, legacy data is an arduous task. In this paper, we describe working with two pre-annotated sets of medieval manuscript images that exhibit conflicting and overlapping metadata. Since a manual reconciliation of the two legacy ontologies would be very expensive, we aim (1) to create a more uniform set of descriptive labels to serve as a "bridge" in the combined dataset, and (2) to establish a high quality hierarchical classification that can be used as a valuable input for subsequent supervised machine learning. To achieve these goals, we developed visualization and interaction mechanisms, enabling medievalists to combine, regularize and extend the vocabulary used to describe these, and other cognate, image datasets. The visual interfaces provide experts an overview of relationships in the data going beyond the sum total of the metadata. Word and image embeddings as well as co-occurrences of labels across the datasets, enable batch re-annotation of images, recommendation of label candidates and support composing a hierarchical classification of labels.
Paper Structure (20 sections, 7 figures)

This paper contains 20 sections, 7 figures.

Figures (7)

  • Figure 1: Examples taken from 1 Kings in the dataset with their labels. The upper ones (from Mandragore), and the lower ones (from Initiale) illustrate possible variation of the images in the dataset. Even in these cases where the same scene is depicted, preservation status and the background colors vary. They also show how Mandragore and Initiale focused on different concepts in the images with the inclusion of positions and gestures in Initiale. Depicted here are four manuscripts: Bibliothèque nationale de France, Latin 18, folio 104r and Latin 17947, folio 112r, Bibliothèque Mazarine 38, folio 230v, and Tours, Bibliothèque municipale 8, folio 230r.
  • Figure 2: Systematic overview of our image and label exploration and annotation workflow.
  • Figure 3: An excerpt of the manuscript graph based on label similarity. Blue nodes are part of Initiale and red nodes are part of Mandragore. Showing the separation of both datasets and the similarity between the manuscripts in the respective dataset. The grey area illustrates the user-selected manuscripts, including some of them that have overlapping labels (connected) and others that do not (without connection).
  • Figure 4: A point cloud of images based on the word embeddings of the labels (a), where only images with labels are visible. After selecting some of the images and changing the used embeddings to the textual description (b) several images without labels are displayed next to or on top of already labeled images. Notably, several images of Vendôme, Bibliothèque municipale 1 have almost the same description as some of the already labeled images. This visual design allows for finding sets of images with the same or similar content amongst a set of labeled and unlabeled ones.
  • Figure 5: The annotation space (a) shows four manuscripts and their labels arranged in columns. Some labels were added by different users; others were replaced by more specific ones. The word space (b) shows words that are similar to the ones currently selected in the annotation space. Here it is shown that after selecting "instrument de musique" (musical instrument) multiple words related to music were added. The recommended co-occurrences (c) show, for example, related terms such as "musique" (music) and "couronne" (crown), suggesting the relationship between music and King David. The recommended neighbors from the most similar images based on computed features (d) contain entries about animals. The excerpt of the label hierarchy (e) shows multiple labels of "oiseaux" (birds) asserted by the user. Manuscripts represented include Bibliothèque nationale de France Latin 32, folio 443r, and Latin 27, folio 226v, and Paris, Bibliothèque interuniversitaire de la Sorbonne 11, folio 150r and folio 169v.
  • ...and 2 more figures