Table of Contents
Fetching ...

SPARQL Generation with Entity Pre-trained GPT for KG Question Answering

Diego Bustamante, Hideaki Takeda

TL;DR

The paper tackles KGQuestion Answering by translating natural questions into SPARQL queries, focusing on a two-stage pipeline that first performs entity linking and then generates SPARQL with a GPT-based model. A key contribution is pre-training the model on all KG entities under a Closed World Assumption to improve few-shot generalization, combined with replacing entities by IRIs to shrink vocabulary. Empirically, pre-training markedly boosts 1-shot and 3-shot SPARQL accuracy (e.g., Acc@1 rising from about 31.9% to 49.2%; Acc@3 from about 43.8% to 62.7%), while zero-shot performance remains limited due to entity comprehension challenges; EL performance remains a strong predictor of overall success. The work demonstrates that a compact model with targeted pre-training and a controlled EL step can approach state-of-the-art behavior for KGQA with limited data and computational resources, offering a practical route for deploying user-friendly KG querying tools.

Abstract

Knowledge Graphs popularity has been rapidly growing in last years. All that knowledge is available for people to query it through the many online databases on the internet. Though, it would be a great achievement if non-programmer users could access whatever information they want to know. There has been a lot of effort oriented to solve this task using natural language processing tools and creativity encouragement by way of many challenges. Our approach focuses on assuming a correct entity linking on the natural language questions and training a GPT model to create SPARQL queries from them. We managed to isolate which property of the task can be the most difficult to solve at few or zero-shot and we proposed pre-training on all entities (under CWA) to improve the performance. We obtained a 62.703% accuracy of exact SPARQL matches on testing at 3-shots, a F1 of 0.809 on the entity linking challenge and a F1 of 0.009 on the question answering challenge.

SPARQL Generation with Entity Pre-trained GPT for KG Question Answering

TL;DR

The paper tackles KGQuestion Answering by translating natural questions into SPARQL queries, focusing on a two-stage pipeline that first performs entity linking and then generates SPARQL with a GPT-based model. A key contribution is pre-training the model on all KG entities under a Closed World Assumption to improve few-shot generalization, combined with replacing entities by IRIs to shrink vocabulary. Empirically, pre-training markedly boosts 1-shot and 3-shot SPARQL accuracy (e.g., Acc@1 rising from about 31.9% to 49.2%; Acc@3 from about 43.8% to 62.7%), while zero-shot performance remains limited due to entity comprehension challenges; EL performance remains a strong predictor of overall success. The work demonstrates that a compact model with targeted pre-training and a controlled EL step can approach state-of-the-art behavior for KGQA with limited data and computational resources, offering a practical route for deploying user-friendly KG querying tools.

Abstract

Knowledge Graphs popularity has been rapidly growing in last years. All that knowledge is available for people to query it through the many online databases on the internet. Though, it would be a great achievement if non-programmer users could access whatever information they want to know. There has been a lot of effort oriented to solve this task using natural language processing tools and creativity encouragement by way of many challenges. Our approach focuses on assuming a correct entity linking on the natural language questions and training a GPT model to create SPARQL queries from them. We managed to isolate which property of the task can be the most difficult to solve at few or zero-shot and we proposed pre-training on all entities (under CWA) to improve the performance. We obtained a 62.703% accuracy of exact SPARQL matches on testing at 3-shots, a F1 of 0.809 on the entity linking challenge and a F1 of 0.009 on the question answering challenge.
Paper Structure (7 sections, 1 figure, 2 tables)

This paper contains 7 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Our solution, the components of our model and how to integrate it in production setting.