CRAFT: Training-Free Cascaded Retrieval for Tabular QA
Adarsh Singh, Kushal Raj Bhandari, Jianxi Gao, Soham Dan, Vivek Gupta
TL;DR
CRAFT introduces a training-free, cascaded retrieval framework for open-domain tabular QA. By combining Gemini-based preprocessing with a three-stage retrieval cascade (sparse filtering, dense mini-table ranking, neural reranking) and end-to-end QA with off-the-shelf LLMs, it achieves competitive retrieval and strong end-to-end QA on NQ-Tables without dataset-specific fine-tuning. The approach demonstrates robustness to query paraphrase, substantial token-efficiency gains via sub-table contexts, and scalable improvements in F1 as retrieval depth increases, especially with larger LLMs. This work enables effective, adaptable tabular QA in dynamic domains where labeled fine-tuning is impractical or unavailable.
Abstract
Table Question Answering (TQA) involves retrieving relevant tables from a large corpus to answer natural language queries. Traditional dense retrieval models, such as DTR and ColBERT, not only incur high computational costs for large-scale retrieval tasks but also require retraining or fine-tuning on new datasets, limiting their adaptability to evolving domains and knowledge. In this work, we propose $\textbf{CRAFT}$, a cascaded retrieval approach that first uses a sparse retrieval model to filter a subset of candidate tables before applying more computationally expensive dense models and neural re-rankers. Our approach achieves better retrieval performance than state-of-the-art (SOTA) sparse, dense, and hybrid retrievers. We further enhance table representations by generating table descriptions and titles using Gemini Flash 1.5. End-to-end TQA results using various Large Language Models (LLMs) on NQ-Tables, a subset of the Natural Questions Dataset, demonstrate $\textbf{CRAFT}$ effectiveness.
