Logical Natural Language Generation from Open-Domain Tables
Wenhu Chen, Jianshu Chen, Yu Su, Zhiyu Chen, William Yang Wang
TL;DR
This work introduces logical NLG, a task that generates statements entailed by open-domain tables rather than mere surface descriptions. The authors construct LogicNLG on top of TabFact, define automatic fidelity metrics (parsing-based, NLI-based, adversarial), and explore a spectrum of baselines including non-pretrained, pretrained, and coarse-to-fine architectures. They demonstrate that pretrained language models substantially improve fluency and fidelity, while adversarial and RL-based approaches trade fluency for fidelity; a coarse-to-fine strategy partially mitigates fidelity gaps. The paper provides comprehensive automatic and human evaluations, analyzes various logical operations, and offers a practical LogicNLG benchmark and codebase to spur future research in logic-aware NLG.
Abstract
Neural natural language generation (NLG) models have recently shown remarkable progress in fluency and coherence. However, existing studies on neural NLG are primarily focused on surface-level realizations with limited emphasis on logical inference, an important aspect of human thinking and language. In this paper, we suggest a new NLG task where a model is tasked with generating natural language statements that can be \emph{logically entailed} by the facts in an open-domain semi-structured table. To facilitate the study of the proposed logical NLG problem, we use the existing TabFact dataset \cite{chen2019tabfact} featured with a wide range of logical/symbolic inferences as our testbed, and propose new automatic metrics to evaluate the fidelity of generation models w.r.t.\ logical inference. The new task poses challenges to the existing monotonic generation frameworks due to the mismatch between sequence order and logical order. In our experiments, we comprehensively survey different generation architectures (LSTM, Transformer, Pre-Trained LM) trained with different algorithms (RL, Adversarial Training, Coarse-to-Fine) on the dataset and made following observations: 1) Pre-Trained LM can significantly boost both the fluency and logical fidelity metrics, 2) RL and Adversarial Training are trading fluency for fidelity, 3) Coarse-to-Fine generation can help partially alleviate the fidelity issue while maintaining high language fluency. The code and data are available at \url{https://github.com/wenhuchen/LogicNLG}.
