Cloud-based digitization workflow with rich metadata acquisition for cultural heritage objects
Krzysztof Kutt, Luiz do Valle Miranda, Jakub Gomułka, Grzegorz J. Nalepa
TL;DR
The paper presents a cloud-based digitization workflow that enables domain experts to acquire rich metadata for cultural heritage objects using familiar Microsoft 365 tools, culminating in a linked data knowledge graph (TAO) populated from Excel-based metadata. It combines The Knowledge Matrix (TKM), the Standard Entries Catalog (SEC), and a Mapping system (MAP) with Office Script and Python validators, stored in SharePoint and extended into an OWL-based TAO ontology. Evaluation across two pilots and two workshops demonstrates the approach's practicality, yielding a large, queryable knowledge base and confirming usability for the Jagiellonian Library while remaining compatible with existing workflows. The work offers a scalable, accessible methodology for GLAM institutions to enrich metadata and enable interoperable, RDF-based representations of their collections.
Abstract
In response to several cultural heritage initiatives at the Jagiellonian University, we developed a new digitization workflow in collaboration with the Jagiellonian Library (JL). The solution is based on easy-to-access technological solutions -- Microsoft 365 cloud with MS Excel files as metadata acquisition interfaces, Office Script for validation, and MS Sharepoint for storage -- that allows metadata acquisition by domain experts regardless of their experience with information systems. The ultimate goal is to create a knowledge graph that describes the analyzed collections, linked to general knowledge bases, as well as to other cultural heritage collections, so careful attention is paid to the high accuracy of metadata and proper links to external sources. The workflow was evaluated in two pilot studies and in two workshops, which allowed for its refinement and confirmation of its correctness and usability for JL. The knowledge graph created as a result of these pilot studies was made available in a public git repository. As the proposed workflow does not interfere with existing systems or domain guidelines regarding digitization and basic metadata collection in a given institution, but extends them in order to enable rich metadata collection, not previously possible, we believe that it could be of interest to all GLAMs.
