Table of Contents
Fetching ...

Hybrid Graphs for Table-and-Text based Question Answering using LLMs

Ankush Agarwal, Ganesh S, Chaitanya Devaguptapu

TL;DR

ODYSSEY tackles Table-Text QA in a zero-shot, fine-tuning-free setting by constructing a Hybrid Graph that unifies tables and linked passages and by pruning context through question-guided traversal. The method achieves state-of-the-art zero-shot performance on Hybrid-QA and OTT-QA across GPT-3.5, GPT-4, and LLaMA-3, with notable gains in Exact Match and F1 and substantial reductions in input tokens. Key innovations include question-driven entity-header mapping, a multi-hop BFS traversal over a Hybrid Graph, and iterative reader prompts that feed compact, relevant context to the LLM. The work demonstrates the practicality of efficient, cross-source reasoning and suggests directions for multi-modal extension and further efficiency improvements.

Abstract

Answering questions that require reasoning and aggregation across both structured (tables) and unstructured (raw text) data sources presents significant challenges. Current methods rely on fine-tuning and high-quality, human-curated data, which is difficult to obtain. Recent advances in Large Language Models (LLMs) have shown promising results for multi-hop question answering (QA) over single-source text data in a zero-shot setting, yet exploration into multi-source Table-Text QA remains limited. In this paper, we present a novel Hybrid Graph-based approach for Table-Text QA that leverages LLMs without fine-tuning. Our method constructs a unified Hybrid Graph from textual and tabular data, pruning information based on the input question to provide the LLM with relevant context concisely. We evaluate our approach on the challenging Hybrid-QA and OTT-QA datasets using state-of-the-art LLMs, including GPT-3.5, GPT-4, and LLaMA-3. Our method achieves the best zero-shot performance on both datasets, improving Exact Match scores by up to 10% on Hybrid-QA and 5.4% on OTT-QA. Moreover, our approach reduces token usage by up to 53% compared to the original context.

Hybrid Graphs for Table-and-Text based Question Answering using LLMs

TL;DR

ODYSSEY tackles Table-Text QA in a zero-shot, fine-tuning-free setting by constructing a Hybrid Graph that unifies tables and linked passages and by pruning context through question-guided traversal. The method achieves state-of-the-art zero-shot performance on Hybrid-QA and OTT-QA across GPT-3.5, GPT-4, and LLaMA-3, with notable gains in Exact Match and F1 and substantial reductions in input tokens. Key innovations include question-driven entity-header mapping, a multi-hop BFS traversal over a Hybrid Graph, and iterative reader prompts that feed compact, relevant context to the LLM. The work demonstrates the practicality of efficient, cross-source reasoning and suggests directions for multi-modal extension and further efficiency improvements.

Abstract

Answering questions that require reasoning and aggregation across both structured (tables) and unstructured (raw text) data sources presents significant challenges. Current methods rely on fine-tuning and high-quality, human-curated data, which is difficult to obtain. Recent advances in Large Language Models (LLMs) have shown promising results for multi-hop question answering (QA) over single-source text data in a zero-shot setting, yet exploration into multi-source Table-Text QA remains limited. In this paper, we present a novel Hybrid Graph-based approach for Table-Text QA that leverages LLMs without fine-tuning. Our method constructs a unified Hybrid Graph from textual and tabular data, pruning information based on the input question to provide the LLM with relevant context concisely. We evaluate our approach on the challenging Hybrid-QA and OTT-QA datasets using state-of-the-art LLMs, including GPT-3.5, GPT-4, and LLaMA-3. Our method achieves the best zero-shot performance on both datasets, improving Exact Match scores by up to 10% on Hybrid-QA and 5.4% on OTT-QA. Moreover, our approach reduces token usage by up to 53% compared to the original context.

Paper Structure

This paper contains 31 sections, 8 equations, 4 figures, 9 tables.

Figures (4)

  • Figure 1: Multi-Dimensional Improvements: Our method (with GPT-4 as reader LLM) demonstrates superior results on Hybrid-QA and OTT-QA. Metrics used: EM: Exact-Match with the gold answer, F1-Score, Query Info Efficiency: normalized metric ($\frac{1}{\text{Input Token Size}}$) that quantifies the efficiency of using fewer input tokens to represent the same documents, w.r.t. reader LLM.
  • Figure 2: Case study on Hybrid-QA: Comparison of our method (ODYSSEY) against various baselines on an example from the Hybrid-QA dataset. Baselines: (i) Question + Context: Providing the LLM only the question without any additional context (ii) Question + Summarized Context: Passing the question along with the summarized documents and table. Our method delivers accurate answer because the Hybrid Graph efficiently connects "Street Fighter" from the document "Capcom" with the relevant table, guiding GPT-4 in generating the correct response, i.e., "PS1" from the "System" table column.
  • Figure 3: Overview of the ODYSSEY framework. Our method comprises of 3 steps: i) Question Analysis, ii) Hybrid Graph Construction, and iii) Hybrid Graph Traversal. First, we begin with Question Analysis ($\textcircled{\small{\textcolor{red}{1a}}}$ in the figure) from where we get question entities, retrieved sub-table, and entity-header mapping. Next, we construct the Entity-Document Graph ($\textcircled{\small{\textcolor{red}{1b}}}$ in the figure). Using entity-doc graph and retrieved sub-table, we construct the Hybrid Graph ($\textcircled{\small{\textcolor{red}{2}}}$ in the figure). At last, we perform Hybrid Graph Traversal ($\textcircled{\small{\textcolor{red}{3}}}$ in the figure) to get the pruned graph which serves as input for the LLM. For a detailed walkthrough, refer to Appendix \ref{['sec: walkthrough']}
  • Figure 4: Hopwise analysis: For ODYSSEY (our method w/ hopwise), we calculate the cumulative EM score (left-side in figure) and average token size (right-side in figure) utilized by each hop for Llama3-8B, GPT-3.5, and GPT-4 on Hybrid-QA. Bars in left-side of the figure denotes standard error - $\sqrt{\frac{em(1-em)}{n}}$.