Table of Contents
Fetching ...

MUSEKG: A Knowledge Graph Over Museum Collections

Jinhao Li, Jianzhong Qi, Soyeon Caren Han, Eun-Jung Holden

TL;DR

MuseKG addresses fragmentation in digitised museum data by unifying structured records and multimodal labels into a typed property graph and enabling natural-language querying. The system comprises a KG constructor that normalises and maps museum records into a graph with nodes, edges, and a fixed relation set, and a NL query interface that grounds user questions via Retrieval-Augmented Generation over the KG. Empirical evaluation on a 150-question benchmark demonstrates MuseKG outperforms zero-shot, few-shot, and SPARQL-prompt baselines across attribute, relation, and one-hop reasoning tasks, with substantially lower latency (~$0.36$ s per query) than baselines. The work demonstrates the value of symbolic grounding for interpretable, scalable cultural-heritage reasoning and points to web-scale integration of digital heritage knowledge.

Abstract

Digital transformation in the cultural heritage sector has produced vast yet fragmented collections of artefact data. Existing frameworks for museum information systems struggle to integrate heterogeneous metadata, unstructured documents, and multimodal artefacts into a coherent and queryable form. We present MuseKG, an end-to-end knowledge-graph framework that unifies structured and unstructured museum data through symbolic-neural integration. MuseKG constructs a typed property graph linking objects, people, organisations, and visual or textual labels, and supports natural language queries. Evaluations on real museum collections demonstrate robust performance across queries over attributes, relations, and related entities, surpassing large-language-model zero-shot, few-shot and SPARQL prompt baselines. The results highlight the importance of symbolic grounding for interpretable and scalable cultural heritage reasoning, and pave the way for web-scale integration of digital heritage knowledge.

MUSEKG: A Knowledge Graph Over Museum Collections

TL;DR

MuseKG addresses fragmentation in digitised museum data by unifying structured records and multimodal labels into a typed property graph and enabling natural-language querying. The system comprises a KG constructor that normalises and maps museum records into a graph with nodes, edges, and a fixed relation set, and a NL query interface that grounds user questions via Retrieval-Augmented Generation over the KG. Empirical evaluation on a 150-question benchmark demonstrates MuseKG outperforms zero-shot, few-shot, and SPARQL-prompt baselines across attribute, relation, and one-hop reasoning tasks, with substantially lower latency (~ s per query) than baselines. The work demonstrates the value of symbolic grounding for interpretable, scalable cultural-heritage reasoning and points to web-scale integration of digital heritage knowledge.

Abstract

Digital transformation in the cultural heritage sector has produced vast yet fragmented collections of artefact data. Existing frameworks for museum information systems struggle to integrate heterogeneous metadata, unstructured documents, and multimodal artefacts into a coherent and queryable form. We present MuseKG, an end-to-end knowledge-graph framework that unifies structured and unstructured museum data through symbolic-neural integration. MuseKG constructs a typed property graph linking objects, people, organisations, and visual or textual labels, and supports natural language queries. Evaluations on real museum collections demonstrate robust performance across queries over attributes, relations, and related entities, surpassing large-language-model zero-shot, few-shot and SPARQL prompt baselines. The results highlight the importance of symbolic grounding for interpretable and scalable cultural heritage reasoning, and pave the way for web-scale integration of digital heritage knowledge.

Paper Structure

This paper contains 6 sections, 2 figures, 2 tables.

Figures (2)

  • Figure 1: System overview of MuseKG. Module 1 constructs the museum collections KG (MuseKG) from records. Module 2 takes a user query, retrieves KG context, and uses an LLM to generate a natural-language answer grounded in MuseKG.
  • Figure 2: Visualisation of an example KG subgraph constructed from a single record. The central object node and its neighbouring nodes (image labels, components, entities, and people) illustrate the attributes that are referenced in the subsequent query example.