Table of Contents
Fetching ...

Graph Queries from Natural Language using Constrained Language Models and Visual Editing

Benedikt Kantz, Kevin Innerebner, Peter Waldert, Stefan Lengauer, Elisabeth Lex, Tobias Schreck

TL;DR

To address the barrier of querying large ontologies, the paper maps natural language to constrained prototype graphs and offers a visual editor for refinement. It introduces a two-step graph extraction process that uses ontology-aware constraints to guarantee SPARQL validity. The authors validate the approach with synthetic benchmarks across several ontologies and a preliminary user study, showing competitive retrieval performance with small models and clear usability benefits. This work enables non-experts to formulate and refine KG queries without deep ontology expertise, reducing reliance on heavy architectures and iterative debugging.

Abstract

Querying knowledge bases using ontologies is usually performed using dedicated query languages, question-answering systems, or visual query editors for Knowledge Graphs. We propose a novel approach that enables users to query the knowledge graph by specifying prototype graphs in natural language and visually editing them. This approach enables non-experts to formulate queries without prior knowledge of the ontology and specific query languages. Our approach converts natural language to these prototype graphs by utilizing a two-step constrained language model generation based on semantically similar features within an ontology. The resulting prototype graph serves as the building block for further user refinements within a dedicated visual query builder. Our approach consistently generates a valid SPARQL query within the constraints imposed by the ontology, without requiring any additional corrections to the syntax or classes and links used. Unlike related language models approaches, which often require multiple iterations to fix invalid syntax, non-existent classes, and non-existent links, our approach achieves this consistently. We evaluate the performance of our system using graph retrieval on synthetic queries, comparing multiple metrics, models, and ontologies. We further validate our system through a preliminary user study. By utilizing our constrained pipeline, we show that the system can perform efficient and accurate retrieval using more efficient models compared to other approaches.

Graph Queries from Natural Language using Constrained Language Models and Visual Editing

TL;DR

To address the barrier of querying large ontologies, the paper maps natural language to constrained prototype graphs and offers a visual editor for refinement. It introduces a two-step graph extraction process that uses ontology-aware constraints to guarantee SPARQL validity. The authors validate the approach with synthetic benchmarks across several ontologies and a preliminary user study, showing competitive retrieval performance with small models and clear usability benefits. This work enables non-experts to formulate and refine KG queries without deep ontology expertise, reducing reliance on heavy architectures and iterative debugging.

Abstract

Querying knowledge bases using ontologies is usually performed using dedicated query languages, question-answering systems, or visual query editors for Knowledge Graphs. We propose a novel approach that enables users to query the knowledge graph by specifying prototype graphs in natural language and visually editing them. This approach enables non-experts to formulate queries without prior knowledge of the ontology and specific query languages. Our approach converts natural language to these prototype graphs by utilizing a two-step constrained language model generation based on semantically similar features within an ontology. The resulting prototype graph serves as the building block for further user refinements within a dedicated visual query builder. Our approach consistently generates a valid SPARQL query within the constraints imposed by the ontology, without requiring any additional corrections to the syntax or classes and links used. Unlike related language models approaches, which often require multiple iterations to fix invalid syntax, non-existent classes, and non-existent links, our approach achieves this consistently. We evaluate the performance of our system using graph retrieval on synthetic queries, comparing multiple metrics, models, and ontologies. We further validate our system through a preliminary user study. By utilizing our constrained pipeline, we show that the system can perform efficient and accurate retrieval using more efficient models compared to other approaches.

Paper Structure

This paper contains 22 sections, 3 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Our query extraction process using , using the query example "a person and the child of a person have the alma mater of the same university". We transform the query into a prototype graph using a constrained . The graph is first approximated using a $(a)$, where the generated classes and links might not match the ontology yet. This initial guess of the is used to search for semantically similar relations $(b)$. With the subset of all possible links and nodes, the graph is extracted again and corrected for possible errors $(c)$, resulting in a graph that adheres to the ontology. The resulting graph can be edited $(d)$, e.g., an additional constraint for a country can be added. The resulting prototype graph can be used to perform queries over a to retrieve instances $(e)$.
  • Figure 2: user interface Kantz2025OnSET showing the query for the example from \ref{['fig:teaser']}. The user can edit the prototype graph $G_p$ on the left, view the connected nodes at the bottom using a circle-packing visualization of the ontology, and inspect the instances $G_{I,h}$ directly on the right within the interface.
  • Figure 3: $F_{1,\textit{node}}$, $F_{1,\textit{rel}}$, and $\textit{GED}_{s}$ on four different ontologies, comparing different models and node amounts $k$, and the raw and aligned (constrained) output.
  • Figure 4: Prototype graph $G_{p,s}$ sampling from the ontology using instance counts within the .