Table of Contents
Fetching ...

Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance

Xixi Wang, Miguel Costa, Jordanka Kovaceva, Shuai Wang, Francisco C. Pereira

TL;DR

This work tackles multi-table question answering on large-scale, real-world tabular data by introducing SGAM, a graph-based framework that encodes human-curated schema links and join paths. SGAM builds a relational graph over table attributes, enabling interpretable, path-based reasoning with pruning and sub-path merging to reduce redundancy. Through experiments on the BIRD benchmark and a new CISS-based real-world dataset, SGAM achieves state-of-the-art performance on standard benchmarks and robust end-to-end QA in industrial settings, while also reducing reliance on extremely large LLM backbones via structured question decomposition. The approach offers practical benefits in transparency, scalability, and adaptability to new domains without extensive retraining.

Abstract

Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods based on semantic similarity work well only on simplified hand-crafted datasets and struggle to handle complex, real-world scenarios with numerous and diverse columns. To address this, we propose a graph-based framework that leverages human-curated relational knowledge to explicitly encode schema links and join paths. Given a natural language query, our method searches on graph to construct interpretable reasoning chains, aided by pruning and sub-path merging strategies to enhance efficiency and coherence. Experiments on both standard benchmarks and a realistic, large-scale dataset demonstrate the effectiveness of our approach. To our knowledge, this is the first multi-table QA system applied to truly complex industrial tabular data.

Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance

TL;DR

This work tackles multi-table question answering on large-scale, real-world tabular data by introducing SGAM, a graph-based framework that encodes human-curated schema links and join paths. SGAM builds a relational graph over table attributes, enabling interpretable, path-based reasoning with pruning and sub-path merging to reduce redundancy. Through experiments on the BIRD benchmark and a new CISS-based real-world dataset, SGAM achieves state-of-the-art performance on standard benchmarks and robust end-to-end QA in industrial settings, while also reducing reliance on extremely large LLM backbones via structured question decomposition. The approach offers practical benefits in transparency, scalability, and adaptability to new domains without extensive retraining.

Abstract

Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods based on semantic similarity work well only on simplified hand-crafted datasets and struggle to handle complex, real-world scenarios with numerous and diverse columns. To address this, we propose a graph-based framework that leverages human-curated relational knowledge to explicitly encode schema links and join paths. Given a natural language query, our method searches on graph to construct interpretable reasoning chains, aided by pruning and sub-path merging strategies to enhance efficiency and coherence. Experiments on both standard benchmarks and a realistic, large-scale dataset demonstrate the effectiveness of our approach. To our knowledge, this is the first multi-table QA system applied to truly complex industrial tabular data.

Paper Structure

This paper contains 31 sections, 10 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: Overview of three Table QA paradigms. Top: Single-table QA, where all relevant information is contained within one table. Middle: Existing multi-table QA approaches that rely on semantic similarity for schema linking across tables. Bottom: Our proposed method, which leverages graph-based, human-curated relational knowledge to explicitly guide schema linking and inter-table reasoning.
  • Figure 2: Illustration of our Schema Graph Assist Multi-table QA framework. Given a complex user query and multiple tables, Step 1 retrieves relevant attributes from the tables using embedding-based semantic similarity. Step 2 uses a schema graph to automatically identify and traverse the minimal set of necessary join paths, such as GV.CASEID = AVOID.CASEID and GV.VEHNO = AVOID.VEHNO, to connect related information across tables. Step 3 uses LLM to generate a final answer based on extracted information. This graph-guided reasoning enables the system to accurately extract the target attribute (OCC.AGE) by aligning semantically related fields and resolving multi-hop dependencies. In this case, the system correctly infers the driver’s age as 88.