Table of Contents
Fetching ...

Automatic Logical Forms improve fidelity in Table-to-Text generation

Iñigo Alonso, Eneko Agirre

TL;DR

The paper tackles the fidelity gap in table-to-text generation by introducing TlT, a two-stage framework that first generates automatic logical forms from a table and then renders text from those forms. It demonstrates that automatic LFs yield substantial fidelity gains (about 30 points) over non-LF baselines, with content selection and LF-to-text components contributing most to improvement. The authors conduct automatic and human evaluations, ablations, and qualitative analyses to quantify remaining challenges in content selection, LF parsing, and logic-to-text generation. The work paves the way for practical, verifiable data-to-text systems and provides open-source resources to extend LF-based approaches to other structured inputs.

Abstract

Table-to-text systems generate natural language statements from structured data like tables. While end-to-end techniques suffer from low factual correctness (fidelity), a previous study reported gains when using manual logical forms (LF) that represent the selected content and the semantics of the target text. Given the manual step, it was not clear whether automatic LFs would be effective, or whether the improvement came from content selection alone. We present TlT which, given a table and a selection of the content, first produces LFs and then the textual statement. We show for the first time that automatic LFs improve quality, with an increase in fidelity of 30 points over a comparable system not using LFs. Our experiments allow to quantify the remaining challenges for high factual correctness, with automatic selection of content coming first, followed by better Logic-to-Text generation and, to a lesser extent, better Table-to-Logic parsing.

Automatic Logical Forms improve fidelity in Table-to-Text generation

TL;DR

The paper tackles the fidelity gap in table-to-text generation by introducing TlT, a two-stage framework that first generates automatic logical forms from a table and then renders text from those forms. It demonstrates that automatic LFs yield substantial fidelity gains (about 30 points) over non-LF baselines, with content selection and LF-to-text components contributing most to improvement. The authors conduct automatic and human evaluations, ablations, and qualitative analyses to quantify remaining challenges in content selection, LF parsing, and logic-to-text generation. The work paves the way for practical, verifiable data-to-text systems and provides open-source resources to extend LF-based approaches to other structured inputs.

Abstract

Table-to-text systems generate natural language statements from structured data like tables. While end-to-end techniques suffer from low factual correctness (fidelity), a previous study reported gains when using manual logical forms (LF) that represent the selected content and the semantics of the target text. Given the manual step, it was not clear whether automatic LFs would be effective, or whether the improvement came from content selection alone. We present TlT which, given a table and a selection of the content, first produces LFs and then the textual statement. We show for the first time that automatic LFs improve quality, with an increase in fidelity of 30 points over a comparable system not using LFs. Our experiments allow to quantify the remaining challenges for high factual correctness, with automatic selection of content coming first, followed by better Logic-to-Text generation and, to a lesser extent, better Table-to-Logic parsing.
Paper Structure (32 sections, 12 figures, 5 tables)

This paper contains 32 sections, 12 figures, 5 tables.

Figures (12)

  • Figure 1: Our proposed system to improve fideltiy, $TlT$, (right) alongside a typical Table-to-Text architecture (left).
  • Figure 2: Example of a table with its caption, a logical form (in linearized and graph forms), its corresponding content selection values and the target statement. Note that w in the table stands for win. More details in the text.
  • Figure 3: Table2Logic architecture, with input in the top and output in the bottom. See text for details.
  • Figure 4: Model configurations used in the main experiments.
  • Figure 5: The logical form Grammar after fixing the ambiguity issues in the original version Chen2020a. We follow the same notation as in IRNet and Valuenet. The tokens to the left of the ::= represent non-terminals (node types in the graph). Tokens in italics represent the possible rules for each node, with pipes ($|$) separating the rules. The rules added to the original grammar in order to fix ambiguity issues are highlighted in green.
  • ...and 7 more figures