TableRAG: Million-Token Table Understanding with Language Models
Si-An Chen, Lesly Miculicich, Julian Martin Eisenschlos, Zifeng Wang, Zilong Wang, Yanfei Chen, Yasuhisa Fujii, Hsuan-Tien Lin, Chen-Yu Lee, Tomas Pfister
TL;DR
TableRAG presents a scalable retrieval-augmented framework for large-scale table understanding, combining tabular query expansion with schema and cell retrieval to drastically reduce input prompt size while preserving reasoning capabilities. By encoding only a small, frequency-aware subset of schema and cell information and leveraging a program-aided LM, it achieves state-of-the-art performance on million-token tables across ArcadeQA, BirdQA, and synthetic TabFact, while reducing token costs. The work also introduces two real-world million-token benchmarks and provides thorough ablations showing the benefits of query expansion and retrieval components over baseline strategies. Overall, TableRAG offers a practical path to robust, scalable LM-based table QA that scales beyond conventional context-length constraints. The approach has potential implications for real-world data analytics and large-scale data QA tasks where tables are too large to reason about directly.
Abstract
Recent advancements in language models (LMs) have notably enhanced their ability to reason with tabular data, primarily through program-aided mechanisms that manipulate and analyze tables. However, these methods often require the entire table as input, leading to scalability challenges due to the positional bias or context length constraints. In response to these challenges, we introduce TableRAG, a Retrieval-Augmented Generation (RAG) framework specifically designed for LM-based table understanding. TableRAG leverages query expansion combined with schema and cell retrieval to pinpoint crucial information before providing it to the LMs. This enables more efficient data encoding and precise retrieval, significantly reducing prompt lengths and mitigating information loss. We have developed two new million-token benchmarks from the Arcade and BIRD-SQL datasets to thoroughly evaluate TableRAG's effectiveness at scale. Our results demonstrate that TableRAG's retrieval design achieves the highest retrieval quality, leading to the new state-of-the-art performance on large-scale table understanding.
