Table of Contents
Fetching ...

Map&Make: Schema Guided Text to Table Generation

Naman Ahuja, Fenil Bardoliya, Chitta Baral, Vivek Gupta

TL;DR

Map&Make tackles open-domain Text-to-Table generation by decomposing text into propositional atoms to uncover latent schemas, then iteratively populating multi-table structures. The three-stage framework—Propositional Atomization, Schema Extraction, and Table Generation—uses a multi-agent prompting setup to improve coverage, reduce hallucinations, and enhance interpretability. Evaluations on Rotowire, Livesum, and Wiki40B show robust improvements in information coverage and table fidelity across diverse LLMs, with ablations confirming the necessity of core components. The approach offers a generalized, schema-agnostic solution for structured summarization with practical impact on information extraction and knowledge base construction.

Abstract

Transforming dense, detailed, unstructured text into an interpretable and summarised table, also colloquially known as Text-to-Table generation, is an essential task for information retrieval. Current methods, however, miss out on how and what complex information to extract; they also lack the ability to infer data from the text. In this paper, we introduce a versatile approach, Map&Make, which "dissects" text into propositional atomic statements. This facilitates granular decomposition to extract the latent schema. The schema is then used to populate the tables that capture the qualitative nuances and the quantitative facts in the original text. Our approach is tested against two challenging datasets, Rotowire, renowned for its complex and multi-table schema, and Livesum, which demands numerical aggregation. By carefully identifying and correcting hallucination errors in Rotowire, we aim to achieve a cleaner and more reliable benchmark. We evaluate our method rigorously on a comprehensive suite of comparative and referenceless metrics. Our findings demonstrate significant improvement results across both datasets with better interpretability in Text-to-Table generation. Moreover, through detailed ablation studies and analyses, we investigate the factors contributing to superior performance and validate the practicality of our framework in structured summarization tasks.

Map&Make: Schema Guided Text to Table Generation

TL;DR

Map&Make tackles open-domain Text-to-Table generation by decomposing text into propositional atoms to uncover latent schemas, then iteratively populating multi-table structures. The three-stage framework—Propositional Atomization, Schema Extraction, and Table Generation—uses a multi-agent prompting setup to improve coverage, reduce hallucinations, and enhance interpretability. Evaluations on Rotowire, Livesum, and Wiki40B show robust improvements in information coverage and table fidelity across diverse LLMs, with ablations confirming the necessity of core components. The approach offers a generalized, schema-agnostic solution for structured summarization with practical impact on information extraction and knowledge base construction.

Abstract

Transforming dense, detailed, unstructured text into an interpretable and summarised table, also colloquially known as Text-to-Table generation, is an essential task for information retrieval. Current methods, however, miss out on how and what complex information to extract; they also lack the ability to infer data from the text. In this paper, we introduce a versatile approach, Map&Make, which "dissects" text into propositional atomic statements. This facilitates granular decomposition to extract the latent schema. The schema is then used to populate the tables that capture the qualitative nuances and the quantitative facts in the original text. Our approach is tested against two challenging datasets, Rotowire, renowned for its complex and multi-table schema, and Livesum, which demands numerical aggregation. By carefully identifying and correcting hallucination errors in Rotowire, we aim to achieve a cleaner and more reliable benchmark. We evaluate our method rigorously on a comprehensive suite of comparative and referenceless metrics. Our findings demonstrate significant improvement results across both datasets with better interpretability in Text-to-Table generation. Moreover, through detailed ablation studies and analyses, we investigate the factors contributing to superior performance and validate the practicality of our framework in structured summarization tasks.

Paper Structure

This paper contains 46 sections, 2 equations, 6 figures, 12 tables.

Figures (6)

  • Figure 1: Comparison between Naive methods and our method for Text-to-Table generation.
  • Figure 2: An illustration of our approach. Propositional Breakdown segments the text to generate atomic statements b. Schema Extraction extracts the table structures to generate table schemas. c. Table Generation iteratively fills tables based on the atomic statements.
  • Figure 3: RMSE of Overcounting and Undercounting Instances for Livesum. Uncercounted refers to cell values less than the ground truth, Overcounted refers to cell values more than the ground truths.
  • Figure 4: Comparison of Schema-Coverage with Increasing Table Sizes for Rotowire.
  • Figure 5: Eight types of event information (inner circle) that require summarization in Livesum dataset, along with their common expressions (outer circle) in the commentary. t3
  • ...and 1 more figures