Table of Contents
Fetching ...

Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models

Omar Mussa, Omer Rana, Benoît Goossens, Pablo Orozco-Terwengel, Charith Perera

TL;DR

The paper investigates enhancing Linked Data (LD) retrieval in conversational UIs by integrating Large Language Models (LLMs) to improve entity extraction and RDF query formulation without retraining. A hybrid pipeline combines LLMs with RDF embeddings and subgraphs created via RDF walks, guided by JSON prompt templates to produce structured outputs. Two use cases—freely applying sensor-property filters and querying RDF schemas—demonstrate increased expressivity and accuracy, though direct SPARQL generation by LLMs remains unreliable in general. Experiments on a SOSA-based dataset with multiple LLMs show few-shot prompting substantially improves results, with Use Case 2 achieving perfect accuracy for GPT-4-turbo, highlighting practical impact for web information systems.

Abstract

Despite the recent broad adoption of Large Language Models (LLMs) across various domains, their potential for enriching information systems in extracting and exploring Linked Data (LD) and Resource Description Framework (RDF) triplestores has not been extensively explored. This paper examines the integration of LLMs within existing systems, emphasising the enhancement of conversational user interfaces (UIs) and their capabilities for data extraction by producing more accurate SPARQL queries without the requirement for model retraining. Typically, conversational UI models necessitate retraining with the introduction of new datasets or updates, limiting their functionality as general-purpose extraction tools. Our approach addresses this limitation by incorporating LLMs into the conversational UI workflow, significantly enhancing their ability to comprehend and process user queries effectively. By leveraging the advanced natural language understanding capabilities of LLMs, our method improves RDF entity extraction within web systems employing conventional chatbots. This integration facilitates a more nuanced and context-aware interaction model, critical for handling the complex query patterns often encountered in RDF datasets and Linked Open Data (LOD) endpoints. The evaluation of this methodology shows a marked enhancement in system expressivity and the accuracy of responses to user queries, indicating a promising direction for future research in this area. This investigation not only underscores the versatility of LLMs in enhancing existing information systems but also sets the stage for further explorations into their potential applications within more specialised domains of web information systems.

Towards Enhancing Linked Data Retrieval in Conversational UIs using Large Language Models

TL;DR

The paper investigates enhancing Linked Data (LD) retrieval in conversational UIs by integrating Large Language Models (LLMs) to improve entity extraction and RDF query formulation without retraining. A hybrid pipeline combines LLMs with RDF embeddings and subgraphs created via RDF walks, guided by JSON prompt templates to produce structured outputs. Two use cases—freely applying sensor-property filters and querying RDF schemas—demonstrate increased expressivity and accuracy, though direct SPARQL generation by LLMs remains unreliable in general. Experiments on a SOSA-based dataset with multiple LLMs show few-shot prompting substantially improves results, with Use Case 2 achieving perfect accuracy for GPT-4-turbo, highlighting practical impact for web information systems.

Abstract

Despite the recent broad adoption of Large Language Models (LLMs) across various domains, their potential for enriching information systems in extracting and exploring Linked Data (LD) and Resource Description Framework (RDF) triplestores has not been extensively explored. This paper examines the integration of LLMs within existing systems, emphasising the enhancement of conversational user interfaces (UIs) and their capabilities for data extraction by producing more accurate SPARQL queries without the requirement for model retraining. Typically, conversational UI models necessitate retraining with the introduction of new datasets or updates, limiting their functionality as general-purpose extraction tools. Our approach addresses this limitation by incorporating LLMs into the conversational UI workflow, significantly enhancing their ability to comprehend and process user queries effectively. By leveraging the advanced natural language understanding capabilities of LLMs, our method improves RDF entity extraction within web systems employing conventional chatbots. This integration facilitates a more nuanced and context-aware interaction model, critical for handling the complex query patterns often encountered in RDF datasets and Linked Open Data (LOD) endpoints. The evaluation of this methodology shows a marked enhancement in system expressivity and the accuracy of responses to user queries, indicating a promising direction for future research in this area. This investigation not only underscores the versatility of LLMs in enhancing existing information systems but also sets the stage for further explorations into their potential applications within more specialised domains of web information systems.
Paper Structure (32 sections, 3 equations, 6 figures, 4 tables)

This paper contains 32 sections, 3 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: A conceptual representation of the conversational UI workflow within the current system. The blue dashed arrows illustrate the proposed enhancements to improve query formulation and information retrieval processes.
  • Figure 2: Illustration of Use Case 1: A scenario demonstrating the limitations of conventional entity extractors.
  • Figure 3: An example of the first iteration of RDF Walk showing the first generated subgraph.
  • Figure 4: A summary of the workflow for generating LLM prompts: It begins with the conversion of user queries into vector representations, followed by similarity calculations to identify relevant RDF triples. These triples form a subgraph that, combined with a prompt template, aids in producing context-aware prompts.
  • Figure 5: Accuracy of selected LLMs for Use Case 1, categorised by question group (G) and test type (few-shot or zero-shot). Green indicates higher accuracy, while red shows lower accuracy.
  • ...and 1 more figures