Map&Make: Schema Guided Text to Table Generation
Naman Ahuja, Fenil Bardoliya, Chitta Baral, Vivek Gupta
TL;DR
Map&Make tackles open-domain Text-to-Table generation by decomposing text into propositional atoms to uncover latent schemas, then iteratively populating multi-table structures. The three-stage framework—Propositional Atomization, Schema Extraction, and Table Generation—uses a multi-agent prompting setup to improve coverage, reduce hallucinations, and enhance interpretability. Evaluations on Rotowire, Livesum, and Wiki40B show robust improvements in information coverage and table fidelity across diverse LLMs, with ablations confirming the necessity of core components. The approach offers a generalized, schema-agnostic solution for structured summarization with practical impact on information extraction and knowledge base construction.
Abstract
Transforming dense, detailed, unstructured text into an interpretable and summarised table, also colloquially known as Text-to-Table generation, is an essential task for information retrieval. Current methods, however, miss out on how and what complex information to extract; they also lack the ability to infer data from the text. In this paper, we introduce a versatile approach, Map&Make, which "dissects" text into propositional atomic statements. This facilitates granular decomposition to extract the latent schema. The schema is then used to populate the tables that capture the qualitative nuances and the quantitative facts in the original text. Our approach is tested against two challenging datasets, Rotowire, renowned for its complex and multi-table schema, and Livesum, which demands numerical aggregation. By carefully identifying and correcting hallucination errors in Rotowire, we aim to achieve a cleaner and more reliable benchmark. We evaluate our method rigorously on a comprehensive suite of comparative and referenceless metrics. Our findings demonstrate significant improvement results across both datasets with better interpretability in Text-to-Table generation. Moreover, through detailed ablation studies and analyses, we investigate the factors contributing to superior performance and validate the practicality of our framework in structured summarization tasks.
