Table of Contents
Fetching ...

Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction

Zheye Deng, Chunkit Chan, Weiqi Wang, Yuxi Sun, Wei Fan, Tianshi Zheng, Yauwai Yim, Yangqiu Song

TL;DR

This work introduces LiveSum, a challenging benchmark for evaluating information integration in text-to-table Generation, and proposes the Text-Tuple-Table (T3) pipeline to decompose the task into text-to-tuple extraction, information integration, and tuple-to-table generation. LiveSum provides 3,771 football match commentaries paired with ground-truth summary tables, enabling rigorous assessment of information integration beyond simple format transfer. Across fine-tuning, zero-shot, and few-shot settings, the results show current LLMs struggle with integration, but the T3 pipeline yields substantial improvements and demonstrates strong generalization to Struct-Bench and Wiki40b. The work highlights the value of structured intermediate representations and code-generated integration for robust text-to-table generation, while also outlining limitations and directions for future research and broader applicability.

Abstract

The task of condensing large chunks of textual information into concise and structured tables has gained attention recently due to the emergence of Large Language Models (LLMs) and their potential benefit for downstream tasks, such as text summarization and text mining. Previous approaches often generate tables that directly replicate information from the text, limiting their applicability in broader contexts, as text-to-table generation in real-life scenarios necessitates information extraction, reasoning, and integration. However, there is a lack of both datasets and methodologies towards this task. In this paper, we introduce LiveSum, a new benchmark dataset created for generating summary tables of competitions based on real-time commentary texts. We evaluate the performances of state-of-the-art LLMs on this task in both fine-tuning and zero-shot settings, and additionally propose a novel pipeline called $T^3$(Text-Tuple-Table) to improve their performances. Extensive experimental results demonstrate that LLMs still struggle with this task even after fine-tuning, while our approach can offer substantial performance gains without explicit training. Further analyses demonstrate that our method exhibits strong generalization abilities, surpassing previous approaches on several other text-to-table datasets. Our code and data can be found at https://github.com/HKUST-KnowComp/LiveSum.

Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction

TL;DR

This work introduces LiveSum, a challenging benchmark for evaluating information integration in text-to-table Generation, and proposes the Text-Tuple-Table (T3) pipeline to decompose the task into text-to-tuple extraction, information integration, and tuple-to-table generation. LiveSum provides 3,771 football match commentaries paired with ground-truth summary tables, enabling rigorous assessment of information integration beyond simple format transfer. Across fine-tuning, zero-shot, and few-shot settings, the results show current LLMs struggle with integration, but the T3 pipeline yields substantial improvements and demonstrates strong generalization to Struct-Bench and Wiki40b. The work highlights the value of structured intermediate representations and code-generated integration for robust text-to-table generation, while also outlining limitations and directions for future research and broader applicability.

Abstract

The task of condensing large chunks of textual information into concise and structured tables has gained attention recently due to the emergence of Large Language Models (LLMs) and their potential benefit for downstream tasks, such as text summarization and text mining. Previous approaches often generate tables that directly replicate information from the text, limiting their applicability in broader contexts, as text-to-table generation in real-life scenarios necessitates information extraction, reasoning, and integration. However, there is a lack of both datasets and methodologies towards this task. In this paper, we introduce LiveSum, a new benchmark dataset created for generating summary tables of competitions based on real-time commentary texts. We evaluate the performances of state-of-the-art LLMs on this task in both fine-tuning and zero-shot settings, and additionally propose a novel pipeline called (Text-Tuple-Table) to improve their performances. Extensive experimental results demonstrate that LLMs still struggle with this task even after fine-tuning, while our approach can offer substantial performance gains without explicit training. Further analyses demonstrate that our method exhibits strong generalization abilities, surpassing previous approaches on several other text-to-table datasets. Our code and data can be found at https://github.com/HKUST-KnowComp/LiveSum.
Paper Structure (72 sections, 2 equations, 8 figures, 9 tables)

This paper contains 72 sections, 2 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: An overview of the differences between our proposed LiveSum dataset and previous dataset wiseman2017challenges, as well as our proposed pipeline called T3(Text-Tuple-Table) which consists of three steps.
  • Figure 2: Overview of the pipeline for constructing the LiveSum dataset illustrated with a sample sentence.
  • Figure 3: Eight types of event information (inner circle) that require summarization in LiveSum dataset, along with their common expressions (outer circle) in the commentary.
  • Figure 4: The performance of various LLMs under fine-tune and zero-shot settings, as well as after the application of the T3 method on the test set of LiveSum dataset. The average RMSE and error rate for each model are displayed, along with the error rate for each of the three difficulty sections. More results are in Table \ref{['tab:main_result']}.
  • Figure 5: Auto-QA coverage of the three methods. The point $(P,C)$ means $P\%$ of the data can achieve a coverage of $C\%$ or higher measured using Auto-QA.
  • ...and 3 more figures