Graph Queries from Natural Language using Constrained Language Models and Visual Editing
Benedikt Kantz, Kevin Innerebner, Peter Waldert, Stefan Lengauer, Elisabeth Lex, Tobias Schreck
TL;DR
To address the barrier of querying large ontologies, the paper maps natural language to constrained prototype graphs and offers a visual editor for refinement. It introduces a two-step graph extraction process that uses ontology-aware constraints to guarantee SPARQL validity. The authors validate the approach with synthetic benchmarks across several ontologies and a preliminary user study, showing competitive retrieval performance with small models and clear usability benefits. This work enables non-experts to formulate and refine KG queries without deep ontology expertise, reducing reliance on heavy architectures and iterative debugging.
Abstract
Querying knowledge bases using ontologies is usually performed using dedicated query languages, question-answering systems, or visual query editors for Knowledge Graphs. We propose a novel approach that enables users to query the knowledge graph by specifying prototype graphs in natural language and visually editing them. This approach enables non-experts to formulate queries without prior knowledge of the ontology and specific query languages. Our approach converts natural language to these prototype graphs by utilizing a two-step constrained language model generation based on semantically similar features within an ontology. The resulting prototype graph serves as the building block for further user refinements within a dedicated visual query builder. Our approach consistently generates a valid SPARQL query within the constraints imposed by the ontology, without requiring any additional corrections to the syntax or classes and links used. Unlike related language models approaches, which often require multiple iterations to fix invalid syntax, non-existent classes, and non-existent links, our approach achieves this consistently. We evaluate the performance of our system using graph retrieval on synthetic queries, comparing multiple metrics, models, and ontologies. We further validate our system through a preliminary user study. By utilizing our constrained pipeline, we show that the system can perform efficient and accurate retrieval using more efficient models compared to other approaches.
