Table of Contents
Fetching ...

CRAFT: Training-Free Cascaded Retrieval for Tabular QA

Adarsh Singh, Kushal Raj Bhandari, Jianxi Gao, Soham Dan, Vivek Gupta

TL;DR

CRAFT introduces a training-free, cascaded retrieval framework for open-domain tabular QA. By combining Gemini-based preprocessing with a three-stage retrieval cascade (sparse filtering, dense mini-table ranking, neural reranking) and end-to-end QA with off-the-shelf LLMs, it achieves competitive retrieval and strong end-to-end QA on NQ-Tables without dataset-specific fine-tuning. The approach demonstrates robustness to query paraphrase, substantial token-efficiency gains via sub-table contexts, and scalable improvements in F1 as retrieval depth increases, especially with larger LLMs. This work enables effective, adaptable tabular QA in dynamic domains where labeled fine-tuning is impractical or unavailable.

Abstract

Table Question Answering (TQA) involves retrieving relevant tables from a large corpus to answer natural language queries. Traditional dense retrieval models, such as DTR and ColBERT, not only incur high computational costs for large-scale retrieval tasks but also require retraining or fine-tuning on new datasets, limiting their adaptability to evolving domains and knowledge. In this work, we propose $\textbf{CRAFT}$, a cascaded retrieval approach that first uses a sparse retrieval model to filter a subset of candidate tables before applying more computationally expensive dense models and neural re-rankers. Our approach achieves better retrieval performance than state-of-the-art (SOTA) sparse, dense, and hybrid retrievers. We further enhance table representations by generating table descriptions and titles using Gemini Flash 1.5. End-to-end TQA results using various Large Language Models (LLMs) on NQ-Tables, a subset of the Natural Questions Dataset, demonstrate $\textbf{CRAFT}$ effectiveness.

CRAFT: Training-Free Cascaded Retrieval for Tabular QA

TL;DR

CRAFT introduces a training-free, cascaded retrieval framework for open-domain tabular QA. By combining Gemini-based preprocessing with a three-stage retrieval cascade (sparse filtering, dense mini-table ranking, neural reranking) and end-to-end QA with off-the-shelf LLMs, it achieves competitive retrieval and strong end-to-end QA on NQ-Tables without dataset-specific fine-tuning. The approach demonstrates robustness to query paraphrase, substantial token-efficiency gains via sub-table contexts, and scalable improvements in F1 as retrieval depth increases, especially with larger LLMs. This work enables effective, adaptable tabular QA in dynamic domains where labeled fine-tuning is impractical or unavailable.

Abstract

Table Question Answering (TQA) involves retrieving relevant tables from a large corpus to answer natural language queries. Traditional dense retrieval models, such as DTR and ColBERT, not only incur high computational costs for large-scale retrieval tasks but also require retraining or fine-tuning on new datasets, limiting their adaptability to evolving domains and knowledge. In this work, we propose , a cascaded retrieval approach that first uses a sparse retrieval model to filter a subset of candidate tables before applying more computationally expensive dense models and neural re-rankers. Our approach achieves better retrieval performance than state-of-the-art (SOTA) sparse, dense, and hybrid retrievers. We further enhance table representations by generating table descriptions and titles using Gemini Flash 1.5. End-to-end TQA results using various Large Language Models (LLMs) on NQ-Tables, a subset of the Natural Questions Dataset, demonstrate effectiveness.

Paper Structure

This paper contains 22 sections, 4 figures, 13 tables.

Figures (4)

  • Figure 1: Overview of the CRAFT Framework.
  • Figure 2: Prompt used for End-to-End question answering.
  • Figure 3: Prompt used for Table Decomposition.
  • Figure 4: Prompt used for Query Expansion.