Table of Contents
Fetching ...

Towards Controlled Table-to-Text Generation with Scientific Reasoning

Zhixin Guo, Jianping Zhou, Jiexing Qi, Mingxuan Yan, Ziwei He, Guanjie Zheng, Zhouhan Lin, Xinbing Wang, Chenghu Zhou

TL;DR

This work presents a new task for generating fluent and logical descriptions that match user preferences over scientific tabular data, aiming to automate scientific document analysis.

Abstract

The sheer volume of scientific experimental results and complex technical statements, often presented in tabular formats, presents a formidable barrier to individuals acquiring preferred information. The realms of scientific reasoning and content generation that adhere to user preferences encounter distinct challenges. In this work, we present a new task for generating fluent and logical descriptions that match user preferences over scientific tabular data, aiming to automate scientific document analysis. To facilitate research in this direction, we construct a new challenging dataset CTRLSciTab consisting of table-description pairs extracted from the scientific literature, with highlighted cells and corresponding domain-specific knowledge base. We evaluated popular pre-trained language models to establish a baseline and proposed a novel architecture outperforming competing approaches. The results showed that large models struggle to produce accurate content that aligns with user preferences. As the first of its kind, our work should motivate further research in scientific domains.

Towards Controlled Table-to-Text Generation with Scientific Reasoning

TL;DR

This work presents a new task for generating fluent and logical descriptions that match user preferences over scientific tabular data, aiming to automate scientific document analysis.

Abstract

The sheer volume of scientific experimental results and complex technical statements, often presented in tabular formats, presents a formidable barrier to individuals acquiring preferred information. The realms of scientific reasoning and content generation that adhere to user preferences encounter distinct challenges. In this work, we present a new task for generating fluent and logical descriptions that match user preferences over scientific tabular data, aiming to automate scientific document analysis. To facilitate research in this direction, we construct a new challenging dataset CTRLSciTab consisting of table-description pairs extracted from the scientific literature, with highlighted cells and corresponding domain-specific knowledge base. We evaluated popular pre-trained language models to establish a baseline and proposed a novel architecture outperforming competing approaches. The results showed that large models struggle to produce accurate content that aligns with user preferences. As the first of its kind, our work should motivate further research in scientific domains.
Paper Structure (7 sections, 2 equations, 4 figures, 3 tables)

This paper contains 7 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: An illustration of controlled table-to-text generation incorporating explicit scientific reasoning stages. (a) represents the input information, (b) illustrates the inherent reasoning processes of language models, and (c) displays the resultant descriptions. Yellow highlights user preferences; red relates to tabular knowledge, and blue indicates scientific reasoning content. Potential scientific reasoning steps are outlined at the bottom. The original table is adapted from vaswani2017attention.
  • Figure 2: Overview of CTRLSciTab construction steps, including mining domain-specific knowledge from PDF source, aligning highlighted cells to table descriptions, and expert verification.
  • Figure 3: An illustration of the CTRLSciTabNet structure: (a) depicts the architecture of our unsupervised retriever; (b) outlines the two-step operation of CTRLSciTabNet, which involves a retriever selecting the top-$n$ related domain-relevant sentences, followed by a pre-trained language model, the generator, utilizing this data alongside tabular inputs and highlighted cells.
  • Figure 4: Case study of CTRLSciTabNet. Contents in yellow cells indicate the highlighted cells. W/o BKG denotes the model without the use of domain-specific knowledge. Green text indicates the correct statements supported by the tabular data, and red text indicates the incorrect statements.