Table of Contents
Fetching ...

PASemiQA: Plan-Assisted Agent for Question Answering on Semi-Structured Data with Text and Relational Information

Hansi Yang, Qi Zhang, Wei Jiang, Jianguo Li

TL;DR

PASemiQA tackles QA on semi-structured data by combining text and relational information through a two-stage plan-guided approach. A planning module generates informative node sets $\mathcal{V}_q$ and relation paths $\{z_i\}$, which guides an LLM-based Graph Traversing Agent to extract evidence and produce answers. The learning signal aligns plan generation with ground-truth paths via a KL divergence between $Q(z|q,a^*,\mathcal{G})$ and $P_\theta(z|q)$, using instruction-tuned LLMs for path generation. Across STaRK datasets (Amazon, MAG, PrimeKG), PASemiQA delivers state-of-the-art Hit@1 scores with competitive latency, demonstrating improved accuracy and reliability for QA on semi-structured data.

Abstract

Large language models (LLMs) have shown impressive abilities in answering questions across various domains, but they often encounter hallucination issues on questions that require professional and up-to-date knowledge. To address this limitation, retrieval-augmented generation (RAG) techniques have been proposed, which retrieve relevant information from external sources to inform their responses. However, existing RAG methods typically focus on a single type of external data, such as vectorized text database or knowledge graphs, and cannot well handle real-world questions on semi-structured data containing both text and relational information. To bridge this gap, we introduce PASemiQA, a novel approach that jointly leverages text and relational information in semi-structured data to answer questions. PASemiQA first generates a plan to identify relevant text and relational information to answer the question in semi-structured data, and then uses an LLM agent to traverse the semi-structured data and extract necessary information. Our empirical results demonstrate the effectiveness of PASemiQA across different semi-structured datasets from various domains, showcasing its potential to improve the accuracy and reliability of question answering systems on semi-structured data.

PASemiQA: Plan-Assisted Agent for Question Answering on Semi-Structured Data with Text and Relational Information

TL;DR

PASemiQA tackles QA on semi-structured data by combining text and relational information through a two-stage plan-guided approach. A planning module generates informative node sets and relation paths , which guides an LLM-based Graph Traversing Agent to extract evidence and produce answers. The learning signal aligns plan generation with ground-truth paths via a KL divergence between and , using instruction-tuned LLMs for path generation. Across STaRK datasets (Amazon, MAG, PrimeKG), PASemiQA delivers state-of-the-art Hit@1 scores with competitive latency, demonstrating improved accuracy and reliability for QA on semi-structured data.

Abstract

Large language models (LLMs) have shown impressive abilities in answering questions across various domains, but they often encounter hallucination issues on questions that require professional and up-to-date knowledge. To address this limitation, retrieval-augmented generation (RAG) techniques have been proposed, which retrieve relevant information from external sources to inform their responses. However, existing RAG methods typically focus on a single type of external data, such as vectorized text database or knowledge graphs, and cannot well handle real-world questions on semi-structured data containing both text and relational information. To bridge this gap, we introduce PASemiQA, a novel approach that jointly leverages text and relational information in semi-structured data to answer questions. PASemiQA first generates a plan to identify relevant text and relational information to answer the question in semi-structured data, and then uses an LLM agent to traverse the semi-structured data and extract necessary information. Our empirical results demonstrate the effectiveness of PASemiQA across different semi-structured datasets from various domains, showcasing its potential to improve the accuracy and reliability of question answering systems on semi-structured data.

Paper Structure

This paper contains 22 sections, 6 equations, 4 figures, 10 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparison of different data structures to supplement question answering of LLM.
  • Figure 2: Complete framework of PASemiQA consisting of two parts: planning module (section \ref{['ssec:plan']}) and LLM agent (section \ref{['ssec:agent']}).
  • Figure 3: Comparison of PASemiQA with different values of $K$. $T$ is set to 5 by default.
  • Figure 4: Comparison of PASemiQA with different values of $T$. $K$ is set to 5 by default.

Theorems & Definitions (3)

  • Definition 1: Semi-structured data wu2024stark
  • Definition 2: QA on semi-structured data
  • Definition 3: Planning generation on semi-structured data