Unifying Structured Data as Graph for Data-to-Text Pre-Training

Shujie Li; Liang Li; Ruiying Geng; Min Yang; Binhua Li; Guanghu Yuan; Wanwei He; Shao Yuan; Can Ma; Fei Huang; Yongbin Li

Unifying Structured Data as Graph for Data-to-Text Pre-Training

Shujie Li, Liang Li, Ruiying Geng, Min Yang, Binhua Li, Guanghu Yuan, Wanwei He, Shao Yuan, Can Ma, Fei Huang, Yongbin Li

TL;DR

This work presents UniD2T, a unified data-to-text pre-training framework that casts diverse structured data as graph-to-text tasks. It introduces a structure-enhanced Transformer with a dedicated position matrix and an attention matrix to explicitly encode graph connectivity, built on a T5 backbone. By aggregating large PreData and DownData corpora and transforming inputs into Levi graphs and connected graphs, UniD2T achieves superior performance across six benchmarks spanning table-, graph-, and key-value-to-text generation, with extensive ablations and analyses validating the importance of graph structure. The approach demonstrates strong cross-domain transfer, data-efficiency in few-shot regimes, and robustness to graph size, marking a significant advance in unified data-to-text pre-training and its practical impact for diverse structured data understanding.

Abstract

Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performances. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different data-to-text generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, encoding relative positional information of connected nodes in the input graph. In addition, we propose a new attention matrix to incorporate graph structures into the original Transformer by taking the available explicit connectivity structure into account. Extensive experiments on six benchmark datasets show the effectiveness of our model. Our source codes are available at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t.

Unifying Structured Data as Graph for Data-to-Text Pre-Training

TL;DR

Abstract

Paper Structure (42 sections, 2 equations, 8 figures, 16 tables)

This paper contains 42 sections, 2 equations, 8 figures, 16 tables.

Introduction
Related Works
Data-to-Text Generation
Data-to-Text Pre-training Models
Pre-training Data Construction
Existing Pre-training Datasets (PreData)
Existing Downstream Datasets (DownData)
Unifying Structured Data
Methodology
Problem Definition
Model Architecture
Structure-enhanced Transformer
Position Matrix Construction
Attention Matrix Construction
Pre-training Objectives
...and 27 more sections

Figures (8)

Figure 1: Unify data in three formats into one graph structure.
Figure 2: Simplified version of model input and connections between nodes.
Figure 3: Transformer blocks on the T5-encoder side. The relative position and attention matrices in the self-attention calculation will be replaced by two novel position and attention matrices.
Figure 4: We construct a new position matrix $\mathbf{P}_{\text{emb}}^{\text{new}}$ to replace the original position matrix $\mathbf{P}_{\text{emb}}$ used in Equation (2). We first set an auxiliary matrix for each edge between two nodes, and then copy the content of the auxiliary matrix into the final position matrix. The distances of nodes lacking direct connections will be set to "$\pm \textbf{inf}$". The lighter the color, the farther the distance is.
Figure 5: We construct a new attention matrix $\mathbf{A}_{\text{mask}}^{\text{new}}$ to replace the attention mask $\mathbf{A}_{\text{mask}}$ used in Equation (2). The attention matrix used to replace the attention mask of self-attention in Transformer. The values of the cells with colors are set to 1, while the values of the cells without colors are set to 0. The blue color represents global attention, the gray color represents the self-connection of nodes, and the green and yellow colors represent the two connected edges.
...and 3 more figures

Unifying Structured Data as Graph for Data-to-Text Pre-Training

TL;DR

Abstract

Unifying Structured Data as Graph for Data-to-Text Pre-Training

Authors

TL;DR

Abstract

Table of Contents

Figures (8)