Table of Contents
Fetching ...

TrustUQA: A Trustful Framework for Unified Structured Data Question Answering

Wen Zhang, Long Jin, Yushan Zhu, Jiaoyan Chen, Zhiwei Huang, Junjie Wang, Yin Hua, Lei Liang, Huajun Chen

TL;DR

TrustUQA introduces Condition Graph ($CG$) as a unified, expressive representation to support QA over tables, knowledge graphs, and temporal knowledge graphs. Its core is a two-layer CG query framework: an LLM writes a simple, human-readable CG query ($Q_{llm}$) which is then translated into an executable CG query ($Q_{exe}$) via predefined rules, with semantic mapping aided by embedding models. A Dynamic Demonstration Retriever ($R$) selects the most relevant demonstrations to prompt the LLM, boosting accuracy without fine-tuning. Evaluations across WikiSQL, WTQ, WebQSP, MetaQA, and CronQuestion show competitive or state-of-the-art performance on multiple benchmarks, especially for cross-type and cross-domain questions, and illustrate TrustUQA’s potential for QA over mixed and across structured data. Overall, TrustUQA delivers a trustworthy, interpretable, and extensible framework for unified QA across heterogeneous data sources.

Abstract

Natural language question answering (QA) over structured data sources such as tables and knowledge graphs have been widely investigated, especially with Large Language Models (LLMs) in recent years. The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multi-types of sources, while the later is limited in trustfulness. In this paper, we propose TrustUQA, a trustful QA framework that can simultaneously support multiple types of structured data in a unified way. To this end, it adopts an LLM-friendly and unified knowledge representation method called Condition Graph(CG), and uses an LLM and demonstration-based two-level method for CG querying. For enhancement, it is also equipped with dynamic demonstration retrieval. We have evaluated TrustUQA with 5 benchmarks covering 3 types of structured data. It outperforms 2 existing unified structured data QA methods. In comparison with the baselines that are specific to one data type, it achieves state-of-the-art on 2 of the datasets. Further more, we have demonstrated the potential of our method for more general QA tasks, QA over mixed structured data and QA across structured data. The code is available at https://github.com/zjukg/TrustUQA.

TrustUQA: A Trustful Framework for Unified Structured Data Question Answering

TL;DR

TrustUQA introduces Condition Graph () as a unified, expressive representation to support QA over tables, knowledge graphs, and temporal knowledge graphs. Its core is a two-layer CG query framework: an LLM writes a simple, human-readable CG query () which is then translated into an executable CG query () via predefined rules, with semantic mapping aided by embedding models. A Dynamic Demonstration Retriever () selects the most relevant demonstrations to prompt the LLM, boosting accuracy without fine-tuning. Evaluations across WikiSQL, WTQ, WebQSP, MetaQA, and CronQuestion show competitive or state-of-the-art performance on multiple benchmarks, especially for cross-type and cross-domain questions, and illustrate TrustUQA’s potential for QA over mixed and across structured data. Overall, TrustUQA delivers a trustworthy, interpretable, and extensible framework for unified QA across heterogeneous data sources.

Abstract

Natural language question answering (QA) over structured data sources such as tables and knowledge graphs have been widely investigated, especially with Large Language Models (LLMs) in recent years. The main solutions include question to formal query parsing and retrieval-based answer generation. However, current methods of the former often suffer from weak generalization, failing to dealing with multi-types of sources, while the later is limited in trustfulness. In this paper, we propose TrustUQA, a trustful QA framework that can simultaneously support multiple types of structured data in a unified way. To this end, it adopts an LLM-friendly and unified knowledge representation method called Condition Graph(CG), and uses an LLM and demonstration-based two-level method for CG querying. For enhancement, it is also equipped with dynamic demonstration retrieval. We have evaluated TrustUQA with 5 benchmarks covering 3 types of structured data. It outperforms 2 existing unified structured data QA methods. In comparison with the baselines that are specific to one data type, it achieves state-of-the-art on 2 of the datasets. Further more, we have demonstrated the potential of our method for more general QA tasks, QA over mixed structured data and QA across structured data. The code is available at https://github.com/zjukg/TrustUQA.

Paper Structure

This paper contains 66 sections, 1 equation, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Overview of the TrustUQA framework
  • Figure 2: (a) The frequency of numbers of answers labeled and predicted by ours on MetaQA. (b) - (d) The hyperparameter of demostration size, retry and self-consistency on WebQSP. (e) The time cost of MetaQA.
  • Figure 3: Error cases of TrustUQA.
  • Figure 4: Case study of TrustUQA for across structured data.
  • Figure 5: Examples for WikiSQL, MetaQA and CronQuestion.

Theorems & Definitions (1)

  • Definition 1