A System and Benchmark for LLM-based Q&A on Heterogeneous Data

Achille Fokoue; Srideepika Jayaraman; Elham Khabiri; Jeffrey O. Kephart; Yingjie Li; Dhruv Shah; Youssef Drissi; Fenno F. Heath; Anu Bhamidipaty; Fateh A. Tipu; Robert J. Baseman

A System and Benchmark for LLM-based Q&A on Heterogeneous Data

Achille Fokoue, Srideepika Jayaraman, Elham Khabiri, Jeffrey O. Kephart, Yingjie Li, Dhruv Shah, Youssef Drissi, Fenno F. Heath, Anu Bhamidipaty, Fateh A. Tipu, Robert J. Baseman

TL;DR

This work tackles NL question answering over heterogeneous industrial data sources by introducing siwarex, a framework that unifies databases and APIs through a relational schema where APIs appear as virtual tables and are invoked via user-defined functions. It combines a ReAct-based NL-to-SQL pipeline, a Table Selector, a Query Rewriter, and guardrails to ensure correct routing and execution, enabling seamless API and DB interactions. To evaluate performance under varying data heterogeneity, the authors extend the Spider benchmark by replacing a configurable fraction of DB tables with API proxies, creating a spectrum from pure DB access to pure API access. Experimental results show siwarex maintains higher execution accuracy than a strong API+DB baseline as heterogeneity increases, demonstrating practical viability for industry-grade NL Q&A over mixed data sources. The work also commits to releasing the modified Spider benchmark to foster further research in heterogeneous data access using LLMs.

Abstract

In many industrial settings, users wish to ask questions whose answers may be found in structured data sources such as a spreadsheets, databases, APIs, or combinations thereof. Often, the user doesn't know how to identify or access the right data source. This problem is compounded even further if multiple (and potentially siloed) data sources must be assembled to derive the answer. Recently, various Text-to-SQL applications that leverage Large Language Models (LLMs) have addressed some of these problems by enabling users to ask questions in natural language. However, these applications remain impractical in realistic industrial settings because they fail to cope with the data source heterogeneity that typifies such environments. In this paper, we address heterogeneity by introducing the siwarex platform, which enables seamless natural language access to both databases and APIs. To demonstrate the effectiveness of siwarex, we extend the popular Spider dataset and benchmark by replacing some of its tables by data retrieval APIs. We find that siwarex does a good job of coping with data source heterogeneity. Our modified Spider benchmark will soon be available to the research community

A System and Benchmark for LLM-based Q&A on Heterogeneous Data

TL;DR

Abstract

Paper Structure (7 sections, 8 figures)

This paper contains 7 sections, 8 figures.

Introduction
Related work
The siwarex Framework
New benchmark datasets
Evaluation
Conclusion
Limitations

Figures (8)

Figure 1: Example of schemas and table view used by siwarex. The Abstract Schema and API Mapping Schema required by siwarex can be provided manually or extracted from domain metadata. If a database schema is provided, the Abstract Schema can be extracted from it automatically; likewise the API Mapping Schema can be extracted automatically from an OpenAPI spec. For systems that mix DB access and API calls, the edges between API and DB nodes in the Abstract Schema may be augmented by a minimal amount of expert knowledge. Once the Abstract Schema is created, a relational schema (DB Table View) is generated from it automatically. The DB Table View represents all entities consistently as tables regardless of whether they are actually tables or APIs.
Figure 2: Runtime view of the main siwarex components described in Section \ref{['sec:framework']}. In this example, siwarex produces an answer in response to "Show high priority events from February 22 involving tags from the same cascade as 3CNF01." If the user opts to receive intermediate explanations of the system's thought process, the explainer routes intermediate results through the DB engine, which can then (like the final output) be rendered as text, tables, or other graphic formats.
Figure 3: Accuracy comparison overall questions, regardless of difficulty.
Figure 4: Accuracy comparison for questions of easy difficulty.
Figure 5: Accuracy comparison for questions of extra hard difficulty.
...and 3 more figures

A System and Benchmark for LLM-based Q&A on Heterogeneous Data

TL;DR

Abstract

A System and Benchmark for LLM-based Q&A on Heterogeneous Data

Authors

TL;DR

Abstract

Table of Contents

Figures (8)