Table of Contents
Fetching ...

HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation

Junyi Bian, Xiaolei Qin, Wuhe Zou, Mengzuo Huang, Congyi Luo, Ke Zhang, Weidong Zhang

TL;DR

HeLM introduces a lightweight, two-module approach for table-to-text generation that first highlights evidence rows in a table and then generates a tailored summary. By training a highlighter and a summarizer with parameter-efficient fine-tuning and a greedy evidence-label search, HeLM achieves state-of-the-art results on FeTaQA and QTSumm, while improving interpretability through explicit evidence highlighting. The key contributions include a practical evidence-label construction strategy, a feedback-based training loop, and empirical demonstrations of the benefits of input table highlighting for generation quality and explainability. The approach holds practical potential for cost-efficient, private table-to-text systems and provides a foundation for further improving evidence-driven reasoning in structured-data tasks.

Abstract

Large models have demonstrated significant progress across various domains, particularly in tasks related to text generation. In the domain of Table to Text, many Large Language Model (LLM)-based methods currently resort to modifying prompts to invoke public APIs, incurring potential costs and information leaks. With the advent of open-source large models, fine-tuning LLMs has become feasible. In this study, we conducted parameter-efficient fine-tuning on the LLaMA2 model. Distinguishing itself from previous fine-tuning-based table-to-text methods, our approach involves injecting reasoning information into the input by emphasizing table-specific row data. Our model consists of two modules: 1) a table reasoner that identifies relevant row evidence, and 2) a table summarizer that generates sentences based on the highlighted table. To facilitate this, we propose a search strategy to construct reasoning labels for training the table reasoner. On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results. Additionally, we observed that highlighting input tables significantly enhances the model's performance and provides valuable interpretability.

HeLM: Highlighted Evidence augmented Language Model for Enhanced Table-to-Text Generation

TL;DR

HeLM introduces a lightweight, two-module approach for table-to-text generation that first highlights evidence rows in a table and then generates a tailored summary. By training a highlighter and a summarizer with parameter-efficient fine-tuning and a greedy evidence-label search, HeLM achieves state-of-the-art results on FeTaQA and QTSumm, while improving interpretability through explicit evidence highlighting. The key contributions include a practical evidence-label construction strategy, a feedback-based training loop, and empirical demonstrations of the benefits of input table highlighting for generation quality and explainability. The approach holds practical potential for cost-efficient, private table-to-text systems and provides a foundation for further improving evidence-driven reasoning in structured-data tasks.

Abstract

Large models have demonstrated significant progress across various domains, particularly in tasks related to text generation. In the domain of Table to Text, many Large Language Model (LLM)-based methods currently resort to modifying prompts to invoke public APIs, incurring potential costs and information leaks. With the advent of open-source large models, fine-tuning LLMs has become feasible. In this study, we conducted parameter-efficient fine-tuning on the LLaMA2 model. Distinguishing itself from previous fine-tuning-based table-to-text methods, our approach involves injecting reasoning information into the input by emphasizing table-specific row data. Our model consists of two modules: 1) a table reasoner that identifies relevant row evidence, and 2) a table summarizer that generates sentences based on the highlighted table. To facilitate this, we propose a search strategy to construct reasoning labels for training the table reasoner. On both the FetaQA and QTSumm datasets, our approach achieved state-of-the-art results. Additionally, we observed that highlighting input tables significantly enhances the model's performance and provides valuable interpretability.
Paper Structure (26 sections, 8 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 26 sections, 8 equations, 5 figures, 5 tables, 1 algorithm.

Figures (5)

  • Figure 1: The overall framework of HeLM. The upper part demonstrates the training process, while the lower part illustrates the inference process.
  • Figure 2: Prompt of Highlighter and Summarizer. The elements within the red brackets can be replaced based on different examples.
  • Figure 3: Prompt of evidence labels distillation.
  • Figure 4: Case of table highlighting, $\{1\}$ corresponds to $E$ in equation \ref{['eqn:highlight']}, and the visualized table corresponds to $T$. The output below is $T^{\star}$.
  • Figure 5: Cases from the FeTaQA Dataset. The highlighter of HeLM has highlighted specific parts of the table using red boxes. The rows in the table with a green background represent manually observed evidence related to the query.