Table of Contents
Fetching ...

Position Paper: Metadata Enrichment Model: Integrating Neural Networks and Semantic Knowledge Graphs for Cultural Heritage Applications

Jan Ignatowicz, Krzysztof Kutt, Grzegorz J. Nalepa

TL;DR

The paper presents the Metadata Enrichment Model (MEM), a modular framework that integrates retrained computer vision, large language models, and semantic knowledge graphs to enrich metadata for digitized cultural heritage artifacts. Central to MEM is the Multilayer Vision Mechanism (MVM), an iterative workflow that detects nested features (e.g., text within seals) and uses LLM-driven decisions to guide successive analyses, with outputs encoded in RDF and exposed as Linked Data. A proof-of-concept on incunabula from the Jagiellonian Digital Library includes a manually annotated 105-page dataset and an initial ontology, demonstrating improved metadata interoperability and cross-institution linking via SPARQL endpoints and connections to Wikidata/DBpedia. The work discusses technical, ethical, and practical challenges, such as domain-specific fine-tuning, ontology evolution, and computational costs, and argues MEM’s approach can scale to broader GLAM applications while reinforcing human-in-the-loop validation and community involvement.

Abstract

The digitization of cultural heritage collections has opened new directions for research, yet the lack of enriched metadata poses a substantial challenge to accessibility, interoperability, and cross-institutional collaboration. In several past years neural networks models such as YOLOv11 and Detectron2 have revolutionized visual data analysis, but their application to domain-specific cultural artifacts - such as manuscripts and incunabula - remains limited by the absence of methodologies that address structural feature extraction and semantic interoperability. In this position paper, we argue, that the integration of neural networks with semantic technologies represents a paradigm shift in cultural heritage digitization processes. We present the Metadata Enrichment Model (MEM), a conceptual framework designed to enrich metadata for digitized collections by combining fine-tuned computer vision models, large language models (LLMs) and structured knowledge graphs. The Multilayer Vision Mechanism (MVM) appears as the key innovation of MEM. This iterative process improves visual analysis by dynamically detecting nested features, such as text within seals or images within stamps. To expose MEM's potential, we apply it to a dataset of digitized incunabula from the Jagiellonian Digital Library and release a manually annotated dataset of 105 manuscript pages. We examine the practical challenges of MEM's usage in real-world GLAM institutions, including the need for domain-specific fine-tuning, the adjustment of enriched metadata with Linked Data standards and computational costs. We present MEM as a flexible and extensible methodology. This paper contributes to the discussion on how artificial intelligence and semantic web technologies can advance cultural heritage research, and also use these technologies in practice.

Position Paper: Metadata Enrichment Model: Integrating Neural Networks and Semantic Knowledge Graphs for Cultural Heritage Applications

TL;DR

The paper presents the Metadata Enrichment Model (MEM), a modular framework that integrates retrained computer vision, large language models, and semantic knowledge graphs to enrich metadata for digitized cultural heritage artifacts. Central to MEM is the Multilayer Vision Mechanism (MVM), an iterative workflow that detects nested features (e.g., text within seals) and uses LLM-driven decisions to guide successive analyses, with outputs encoded in RDF and exposed as Linked Data. A proof-of-concept on incunabula from the Jagiellonian Digital Library includes a manually annotated 105-page dataset and an initial ontology, demonstrating improved metadata interoperability and cross-institution linking via SPARQL endpoints and connections to Wikidata/DBpedia. The work discusses technical, ethical, and practical challenges, such as domain-specific fine-tuning, ontology evolution, and computational costs, and argues MEM’s approach can scale to broader GLAM applications while reinforcing human-in-the-loop validation and community involvement.

Abstract

The digitization of cultural heritage collections has opened new directions for research, yet the lack of enriched metadata poses a substantial challenge to accessibility, interoperability, and cross-institutional collaboration. In several past years neural networks models such as YOLOv11 and Detectron2 have revolutionized visual data analysis, but their application to domain-specific cultural artifacts - such as manuscripts and incunabula - remains limited by the absence of methodologies that address structural feature extraction and semantic interoperability. In this position paper, we argue, that the integration of neural networks with semantic technologies represents a paradigm shift in cultural heritage digitization processes. We present the Metadata Enrichment Model (MEM), a conceptual framework designed to enrich metadata for digitized collections by combining fine-tuned computer vision models, large language models (LLMs) and structured knowledge graphs. The Multilayer Vision Mechanism (MVM) appears as the key innovation of MEM. This iterative process improves visual analysis by dynamically detecting nested features, such as text within seals or images within stamps. To expose MEM's potential, we apply it to a dataset of digitized incunabula from the Jagiellonian Digital Library and release a manually annotated dataset of 105 manuscript pages. We examine the practical challenges of MEM's usage in real-world GLAM institutions, including the need for domain-specific fine-tuning, the adjustment of enriched metadata with Linked Data standards and computational costs. We present MEM as a flexible and extensible methodology. This paper contributes to the discussion on how artificial intelligence and semantic web technologies can advance cultural heritage research, and also use these technologies in practice.

Paper Structure

This paper contains 24 sections, 5 figures.

Figures (5)

  • Figure 1: Metadata Enrichment Model (MEM) main flow.
  • Figure 2: Possible entry modalities for MEM.
  • Figure 3: An example image from the created dataset. The image shows the detected classes: initial, header, and paragraph.
  • Figure 4: An example image from the created dataset. The image shows the labeled classes: stamp, initial, header, ornament, and paragraph.
  • Figure 5: Example created ontology structure based on found new metadata and provided metadata by institution.