The Challenge of Achieving Attributability in Multilingual Table-to-Text Generation with Question-Answer Blueprints
Aden Haussmann
TL;DR
The paper investigates whether QA blueprints can improve the attributability of multilingual Table-to-Text verbalisation. By extending the TaTA dataset with QA blueprints and finetuning mT5 variants, it demonstrates clear gains in English but limited improvements in multilingual settings due to translation errors and weaker alignment between blueprints and generated text. It introduces and evaluates metrics (chrF, BLEU, FactKB, StATA) and finds StATA the most reliable automatic measure for TaTA attributability, while highlighting significant multilingual challenges. The findings suggest that larger multilingual data, constrained decoding, and more language-aware blueprint strategies are needed to realize attributability gains across languages, and it releases tools to facilitate further TaTA research.
Abstract
Multilingual Natural Language Generation (NLG) is challenging due to the lack of training data for low-resource languages. However, some low-resource languages have up to tens of millions of speakers globally, making it important to improve NLG tools for them. Table-to-Text NLG is an excellent measure of models' reasoning abilities but is very challenging in the multilingual setting. System outputs are often not attributable, or faithful, to the data in the source table. Intermediate planning techniques like Question-Answer (QA) blueprints have been shown to improve attributability on summarisation tasks. This work explores whether QA blueprints make multilingual Table-to-Text outputs more attributable to the input tables. This paper extends the challenging multilingual Table-to-Text dataset, TaTA, which includes African languages, with QA blueprints. Sequence-to-sequence language models are then finetuned on this dataset, with and without blueprints. Results show that QA blueprints improve performance for models finetuned and evaluated only on English examples, but do not demonstrate gains in the multilingual setting. This is due to inaccuracies in machine translating the blueprints from English into target languages when generating the training data, and models failing to rely closely on the blueprints they generate. An in-depth analysis is conducted on why this is challenging.
