Graph Language Models

Moritz Plenz; Anette Frank

Graph Language Models

Moritz Plenz, Anette Frank

TL;DR

GLMs address the gap between text-focused LMs and structure-preserving GNNs by initializing a graph transformer with pretrained LM weights and equipping it with graph-aware attention. They preprocess GoTs into extended Levi graphs and employ local and global variants to integrate local triplet semantics with global graph context, enabling joint graph-text encoding. Across ConceptNet and Wikidata/Wikipedia tasks, GLMs outperform LM-, GNN-, and GT-based baselines in both linear probing and finetuning, with larger models and global attention providing the strongest gains under long-range connectivity. This work demonstrates that pretrained LM representations are beneficial for graph reasoning and proposes a practical framework for jointly embedding knowledge graphs and natural language in NLP applications.

Abstract

While Language Models (LMs) are the workhorses of NLP, their interplay with structured knowledge graphs (KGs) is still actively researched. Current methods for encoding such graphs typically either (i) linearize them for embedding with LMs -- which underutilize structural information, or (ii) use Graph Neural Networks (GNNs) to preserve the graph structure -- but GNNs cannot represent text features as well as pretrained LMs. In our work we introduce a novel LM type, the Graph Language Model (GLM), that integrates the strengths of both approaches and mitigates their weaknesses. The GLM parameters are initialized from a pretrained LM to enhance understanding of individual graph concepts and triplets. Simultaneously, we design the GLM's architecture to incorporate graph biases, thereby promoting effective knowledge distribution within the graph. This enables GLMs to process graphs, texts, and interleaved inputs of both. Empirical evaluations on relation classification tasks show that GLM embeddings surpass both LM- and GNN-based baselines in supervised and zero-shot setting, demonstrating their versatility.

Graph Language Models

TL;DR

Abstract

Paper Structure (42 sections, 2 equations, 10 figures, 9 tables)

This paper contains 42 sections, 2 equations, 10 figures, 9 tables.

Introduction
Related Work
LMs
GNNs
Graph Transformers
Preliminary: Graph Transformers (GT)
Positional Encoding
Masked Attention
Graph Language Model
GLM vs. GT
Graph preprocessing
Positional Encodings
Local and global GLM
Joint graph and text encoding
Uni- and Bidirectional LMs
...and 27 more sections

Figures (10)

Figure 1: The GLM inherits its architecture from a Graph Transformer, and its parameters from a LM. This enables it to jointly reason over graphs and language.
Figure 2: Example of graph preprocessing in our GLM. Fig \ref{['fig:preprocessing_levi']} shows relative distances for dog, i.e., when dog is attending to other tokens. The red Graph-to-Graph (G2G) connections only exist for the $g$GLM, not for the $\ell$GLM.
Figure 3: Relative position matrix $P$ for tokens in Fig. \ref{['fig:preprocessing_levi']}. Entries with G2G have no relative position ($\ell$GLM) or are initialized from $+\infty$ ($g$GLM). Cf. §\ref{['sec:app:model']}.
Figure 4: KG population test results during training. $g$GLM outperforms T5 set by up to 6 points in \ref{['fig:exp:wiki:baselines:rel_class']}.
Figure 5: Relative positions $P$ and masking $M$ for $\ell$GLM and $g$GLM.
...and 5 more figures

Graph Language Models

TL;DR

Abstract

Graph Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (10)