Plugging Schema Graph into Multi-Table QA: A Human-Guided Framework for Reducing LLM Reliance
Xixi Wang, Miguel Costa, Jordanka Kovaceva, Shuai Wang, Francisco C. Pereira
TL;DR
This work tackles multi-table question answering on large-scale, real-world tabular data by introducing SGAM, a graph-based framework that encodes human-curated schema links and join paths. SGAM builds a relational graph over table attributes, enabling interpretable, path-based reasoning with pruning and sub-path merging to reduce redundancy. Through experiments on the BIRD benchmark and a new CISS-based real-world dataset, SGAM achieves state-of-the-art performance on standard benchmarks and robust end-to-end QA in industrial settings, while also reducing reliance on extremely large LLM backbones via structured question decomposition. The approach offers practical benefits in transparency, scalability, and adaptability to new domains without extensive retraining.
Abstract
Large language models (LLMs) have shown promise in table Question Answering (Table QA). However, extending these capabilities to multi-table QA remains challenging due to unreliable schema linking across complex tables. Existing methods based on semantic similarity work well only on simplified hand-crafted datasets and struggle to handle complex, real-world scenarios with numerous and diverse columns. To address this, we propose a graph-based framework that leverages human-curated relational knowledge to explicitly encode schema links and join paths. Given a natural language query, our method searches on graph to construct interpretable reasoning chains, aided by pruning and sub-path merging strategies to enhance efficiency and coherence. Experiments on both standard benchmarks and a realistic, large-scale dataset demonstrate the effectiveness of our approach. To our knowledge, this is the first multi-table QA system applied to truly complex industrial tabular data.
