Table-to-Text Generation with Pretrained Diffusion Models
Aleksei S. Krylov, Oleg D. Somov
TL;DR
This paper investigates applying diffusion-based generation to table-to-text, adapting a pretrained encoder–decoder diffusion model (GENIE) for conditional generation on the ToTTo dataset. By comparing diffusion-based generation with autoregressive baselines and exploring sampling accelerators (DPM-Solver++) and aggregation strategies (MBR, ROVER), it analyzes training regimes, length constraints, and pre-training effects. The study shows diffusion models can achieve comparable quality and diversity to autoregressive methods, with training-from-scratch diffusion outperforming autoregressive baselines and MBR providing robust aggregation advantages; faster sampling comes with some loss in diversity. Overall, diffusion approaches emerge as a viable, versatile direction for table-to-text, warranting further exploration of transformer variants, resource allocation, and handling more complex table structures.
Abstract
Diffusion models have demonstrated significant potential in achieving state-of-the-art performance across various text generation tasks. In this systematic study, we investigate their application to the table-to-text problem by adapting the diffusion model to the task and conducting an in-depth analysis. Our experiments cover multiple aspects of diffusion models training. We explore sampling strategy influence by inducing recent diffusion model accelerator DPM-Solver++ into our core model. We have tested different prediction aggregation methods, like ROVER and Minimum Bayes-Risk (MBR). Our studies cover the impact of the pre-training phase in diffusion models and the generation length constraints influence. We also have compared diffusion model generation with auto-regressive text-to-text models with different temperature settings for diversity evaluation. Our key observation is that diffusion models demonstrate the balance between quality and diversity while auto-regressive text-to-text models are not successful at handling both at the same time. Furthermore, we found out that to achieve the highest quality possible, it is preferable to use a regular sampler with the strictest length constraint to create multiple samples, and then use MBR to aggregate the predictions. However, if you are prepared to give up high level of diversity and to accelerate the process, you can also utilize a fast sampler DPM-Solver++. Our findings reveal that diffusion models achieve comparable results in the table-to-text domain, highlighting their viability in the table-to-text challenge as a promising research direction.
