Table of Contents
Fetching ...

Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios

Yilun Zhao, Haowei Zhang, Shengyun Si, Linyong Nan, Xiangru Tang, Arman Cohan

TL;DR

The study investigates how large language models perform on real-world table-to-text generation across data-insight and query-based tasks, introducing LoTNLG and F2WTQ as new benchmarks. It shows GPT-4 excels in generation, evaluation, and feedback generation, while open-source LLMs lag significantly, highlighting a gap for broader deployment. The work also demonstrates that chain-of-thought prompting yields more faithful factual assessments and proposes LLM-driven feedback to improve lower-quality outputs. Public data and code enable reproducibility and further research in realistic table information seeking. Overall, the findings advance understanding of LLMs’ utility in practical table-to-text tasks and guide future improvements in fidelity and usability.

Abstract

Tabular data is prevalent across various industries, necessitating significant time and effort for users to understand and manipulate for their information-seeking purposes. The advancements in large language models (LLMs) have shown enormous potential to improve user efficiency. However, the adoption of LLMs in real-world applications for table information seeking remains underexplored. In this paper, we investigate the table-to-text capabilities of different LLMs using four datasets within two real-world information seeking scenarios. These include the LogicNLG and our newly-constructed LoTNLG datasets for data insight generation, along with the FeTaQA and our newly-constructed F2WTQ datasets for query-based generation. We structure our investigation around three research questions, evaluating the performance of LLMs in table-to-text generation, automated evaluation, and feedback generation, respectively. Experimental results indicate that the current high-performing LLM, specifically GPT-4, can effectively serve as a table-to-text generator, evaluator, and feedback generator, facilitating users' information seeking purposes in real-world scenarios. However, a significant performance gap still exists between other open-sourced LLMs (e.g., Tulu and LLaMA-2) and GPT-4 models. Our data and code are publicly available at https://github.com/yale-nlp/LLM-T2T.

Investigating Table-to-Text Generation Capabilities of LLMs in Real-World Information Seeking Scenarios

TL;DR

The study investigates how large language models perform on real-world table-to-text generation across data-insight and query-based tasks, introducing LoTNLG and F2WTQ as new benchmarks. It shows GPT-4 excels in generation, evaluation, and feedback generation, while open-source LLMs lag significantly, highlighting a gap for broader deployment. The work also demonstrates that chain-of-thought prompting yields more faithful factual assessments and proposes LLM-driven feedback to improve lower-quality outputs. Public data and code enable reproducibility and further research in realistic table information seeking. Overall, the findings advance understanding of LLMs’ utility in practical table-to-text tasks and guide future improvements in fidelity and usability.

Abstract

Tabular data is prevalent across various industries, necessitating significant time and effort for users to understand and manipulate for their information-seeking purposes. The advancements in large language models (LLMs) have shown enormous potential to improve user efficiency. However, the adoption of LLMs in real-world applications for table information seeking remains underexplored. In this paper, we investigate the table-to-text capabilities of different LLMs using four datasets within two real-world information seeking scenarios. These include the LogicNLG and our newly-constructed LoTNLG datasets for data insight generation, along with the FeTaQA and our newly-constructed F2WTQ datasets for query-based generation. We structure our investigation around three research questions, evaluating the performance of LLMs in table-to-text generation, automated evaluation, and feedback generation, respectively. Experimental results indicate that the current high-performing LLM, specifically GPT-4, can effectively serve as a table-to-text generator, evaluator, and feedback generator, facilitating users' information seeking purposes in real-world scenarios. However, a significant performance gap still exists between other open-sourced LLMs (e.g., Tulu and LLaMA-2) and GPT-4 models. Our data and code are publicly available at https://github.com/yale-nlp/LLM-T2T.
Paper Structure (33 sections, 8 figures, 7 tables)

This paper contains 33 sections, 8 figures, 7 tables.

Figures (8)

  • Figure 1: The real-world table information seeking scenarios and research questions investigated in this paper.
  • Figure 2: An example of LoTNLG, where models are required to generate statements using the specified types of logical reasoning operations
  • Figure 3: An example of F2WTQ, where models need to perform human-like reasoning to generate response.
  • Figure 4: Distribution of logical reasoning types for the LoTNLG dataset.
  • Figure 5: An example of 1-shot direct-prediction prompting for the LogicNLG task.
  • ...and 3 more figures