Table of Contents
Fetching ...

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

Kezhi Kong, Jiani Zhang, Zhengyuan Shen, Balasubramaniam Srinivasan, Chuan Lei, Christos Faloutsos, Huzefa Rangwala, George Karypis

TL;DR

OpenTab tackles open-domain table reasoning by grounding LLM outputs in retrieved tables through a BM25-based retriever and a non-fine-tuned reasoning pipeline. The core idea decomposes reasoning into a Coder that generates SQL, a RowSelector that curates evidence rows, and a Reader that produces the final answer, augmented by a Generative Reranking & Sequential Reasoning (GRSR) strategy to mitigate hallucinations. Experiments on Open-WikiTables, WikiTableQuestions, and FEVEROUS show OpenTab outperforms baselines in open-domain and closed-domain settings, with up to 21.5% accuracy gains. The work demonstrates robust, scalable grounding for tabular data and provides ablations confirming the value of simple-to-complex SQL generation and the GRSR strategy.

Abstract

Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to diversified data modalities and large table sizes. In this work, we propose OpenTab, an open-domain table reasoning framework powered by LLMs. Overall, OpenTab leverages table retriever to fetch relevant tables and then generates SQL programs to parse the retrieved tables efficiently. Utilizing the intermediate data derived from the SQL executions, it conducts grounded inference to produce accurate response. Extensive experimental evaluation shows that OpenTab significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system.

OpenTab: Advancing Large Language Models as Open-domain Table Reasoners

TL;DR

OpenTab tackles open-domain table reasoning by grounding LLM outputs in retrieved tables through a BM25-based retriever and a non-fine-tuned reasoning pipeline. The core idea decomposes reasoning into a Coder that generates SQL, a RowSelector that curates evidence rows, and a Reader that produces the final answer, augmented by a Generative Reranking & Sequential Reasoning (GRSR) strategy to mitigate hallucinations. Experiments on Open-WikiTables, WikiTableQuestions, and FEVEROUS show OpenTab outperforms baselines in open-domain and closed-domain settings, with up to 21.5% accuracy gains. The work demonstrates robust, scalable grounding for tabular data and provides ablations confirming the value of simple-to-complex SQL generation and the GRSR strategy.

Abstract

Large Language Models (LLMs) trained on large volumes of data excel at various natural language tasks, but they cannot handle tasks requiring knowledge that has not been trained on previously. One solution is to use a retriever that fetches relevant information to expand LLM's knowledge scope. However, existing textual-oriented retrieval-based LLMs are not ideal on structured table data due to diversified data modalities and large table sizes. In this work, we propose OpenTab, an open-domain table reasoning framework powered by LLMs. Overall, OpenTab leverages table retriever to fetch relevant tables and then generates SQL programs to parse the retrieved tables efficiently. Utilizing the intermediate data derived from the SQL executions, it conducts grounded inference to produce accurate response. Extensive experimental evaluation shows that OpenTab significantly outperforms baselines in both open- and closed-domain settings, achieving up to 21.5% higher accuracy. We further run ablation studies to validate the efficacy of our proposed designs of the system.
Paper Structure (22 sections, 6 figures, 7 tables)

This paper contains 22 sections, 6 figures, 7 tables.

Figures (6)

  • Figure 1: An overview of OpenTab pipeline. OpenTab uses a Retriever to retrieve relevant sampled tables from a corpus of tables for a given natural language query, and then use a Reasoner to output a natural language response.
  • Figure 2: The accuracy with an increasing numbers of retrieved tables. OpenTab has a consistent increase in accuracy with more tables.
  • Figure 3: Examples illustrating the progressive Simple-to-complex SQL proficiency. The basic SQL queries mainly select specific columns. The intermediate incorporates both column and row selection. The advanced SQL employs additional operations like aggregation and text operations that can manipulate and transform the tabular data. The cells in blue are outputs of the SQL programs.
  • Figure 4: Ablation study on the open-domain strategy. "JR" stands for "Joint Reasoning" and "SR" stands for "Sequential Reasoning".
  • Figure 5: Prompt and generation structures of both Coder and Reader.
  • ...and 1 more figures