Autoregressive Models for Knowledge Graph Generation

Thiviyan Thanapalasingam; Antonis Vozikis; Peter Bloem; Paul Groth

Autoregressive Models for Knowledge Graph Generation

Thiviyan Thanapalasingam, Antonis Vozikis, Peter Bloem, Paul Groth

TL;DR

The paper tackles knowledge graph generation by learning a joint distribution $p_\theta(G)$ over subgraphs, enabling semantic constraints without explicit rule supervision. It introduces ARK, an autoregressive approach that linearizes graphs as sequences of triples and generates $p_\theta(G)$ token-by-token, and SAIL, a variational extension that enables controlled generation and interpolation in latent space. Across IntelliGraphs, ARK and SAIL achieve high semantic validity (near 100%), strong novelty, and efficient compression, outperforming independent-triple KGE baselines. The work shows that model capacity, especially hidden dimensionality $d_{\text{model}} \ge 64$, matters more than depth, and the GRU-based decoders offer favorable efficiency for KG generation with practical implications for knowledge base augmentation and query answering.

Abstract

Knowledge Graph (KG) generation requires models to learn complex semantic dependencies between triples while maintaining domain validity constraints. Unlike link prediction, which scores triples independently, generative models must capture interdependencies across entire subgraphs to produce semantically coherent structures. We present ARK (Auto-Regressive Knowledge Graph Generation), a family of autoregressive models that generate KGs by treating graphs as sequences of (head, relation, tail) triples. ARK learns implicit semantic constraints directly from data, including type consistency, temporal validity, and relational patterns, without explicit rule supervision. On the IntelliGraphs benchmark, our models achieve 89.2% to 100.0% semantic validity across diverse datasets while generating novel graphs not seen during training. We also introduce SAIL, a variational extension of ARK that enables controlled generation through learned latent representations, supporting both unconditional sampling and conditional completion from partial graphs. Our analysis reveals that model capacity (hidden dimensionality >= 64) is more critical than architectural depth for KG generation, with recurrent architectures achieving comparable validity to transformer-based alternatives while offering substantial computational efficiency. These results demonstrate that autoregressive models provide an effective framework for KG generation, with practical applications in knowledge base completion and query answering.

Autoregressive Models for Knowledge Graph Generation

TL;DR

The paper tackles knowledge graph generation by learning a joint distribution

over subgraphs, enabling semantic constraints without explicit rule supervision. It introduces ARK, an autoregressive approach that linearizes graphs as sequences of triples and generates

token-by-token, and SAIL, a variational extension that enables controlled generation and interpolation in latent space. Across IntelliGraphs, ARK and SAIL achieve high semantic validity (near 100%), strong novelty, and efficient compression, outperforming independent-triple KGE baselines. The work shows that model capacity, especially hidden dimensionality

, matters more than depth, and the GRU-based decoders offer favorable efficiency for KG generation with practical implications for knowledge base augmentation and query answering.

Abstract

Paper Structure (28 sections, 6 equations, 6 figures, 6 tables)

This paper contains 28 sections, 6 equations, 6 figures, 6 tables.

Introduction
Preliminaries
Sequential Decoding for Knowledge Graph Generation
Graph Input Processing
Autoregressive Knowledge generation (ARK)
Sequential Autoregressive Knowledge Graph Generation with Latents (SAIL)
Evaluation
Compression Code Length
Sampling from Latent Variable, $z$
Interpolation in Latent Space
Ablation Study
Related Work
Conclusion
Appendix
Experimental Details
...and 13 more sections

Figures (6)

Figure 1: Overview of Model Architectures. (a) SAIL Encoder: Multi-layer perceptron (MLP) processes linearized KG sequences $[\texttt{BOS}, h_1, r_1, t_1, h_2, r_2, t_2, \ldots, \texttt{EOS}]$, with mean pooling to produce fixed-size representations. Linear projections generate latent distribution parameters $\mu$ and $\log\sigma$. (b) SAIL Decoder: GRU-based decoder conditions on sampled latent code $z \sim \mathcal{N}(\mu, \sigma^2)$ by broadcasting $z$ to all sequence positions and concatenating with embeddings $[M_1, M_2, \ldots, M_n]$ at each timestep. (c) ARK Decoder: GRU decoder for ARK operates without latent conditioning, processing embedded sequences directly through stacked GRU layers. (d) Sampling: Autoregressive generation proceeds token-by-token with causal masking until EOS token or maximum length.
Figure 2: Latent space visualization for the wd-movies dataset. (a) t-SNE projection shows clear clustering by genre. (b) Smooth interpolation paths connect different movie types. (c) Decoded graphs along the interpolation path show gradual transitions in cast and genre attributes, maintaining semantic validity throughout.
Figure 3: Graphs generated by ARK conditioned on director entities for Wes Anderson (left) and (b) Tim Burton (right). Node colors indicate entity types: movie (blue), directors (red), actors (green), and genres (purple).
Figure 4: Effect of architectural hyperparameters on the semantic validity and novelty. (Left) Valid & Novel rate as a function of the number of GRU layers, showing stable performance across depths with high variance. (Center) Performance variation with model dimension (hidden units), demonstrating a sharp improvement threshold around 64 dimensions, followed by consistent high performance. (Right) Scatter plot of individual experimental runs showing the relationship between model dimension and generation quality, with color indicating the number of layers.
Figure 5: Effect of progressive conditioning on Knowledge Graph generation for the syn-paths dataset. Subfigure (a) quantifies novelty and validity under increasing conditioning, (b) shows the corresponding reduction in sample diversity, and (c) provides an example of a conditioned generation where the model completes a partially specified graph.
...and 1 more figures

Theorems & Definitions (2)

Definition 2.1: Knowledge Graph Generation
Definition 2.2: Semantic Validity

Autoregressive Models for Knowledge Graph Generation

TL;DR

Abstract

Autoregressive Models for Knowledge Graph Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (2)