PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning

Jaejun Lee; Minsung Hwang; Joyce Jiyoung Whang

PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning

Jaejun Lee, Minsung Hwang, Joyce Jiyoung Whang

TL;DR

The paper tackles the lack of theoretical guarantees for knowledge graph representation learning by deriving the first PAC-Bayesian generalization bounds for KGRL. It introduces ReED, a flexible Relation-aware Encoder-Decoder framework with a RAMP encoder and two decoders (TD and SM) that can replicate a broad set of KGRL methods. The main contributions are the transductive PAC-Bayesian bound for deterministic triplet classifiers and the ensuing ReED-specific bounds that quantify how depth, parameter count, and weight norms affect generalization; simplified forms highlight the benefits of mean aggregators and parameter sharing. Empirically, the authors validate the theoretical factors on three real-world knowledge graphs, showing that mean aggregators, smaller parameter counts, and controlled norms align with reduced generalization gaps, and that the theory captures trends seen in practice. The work provides a principled design guide for KGRL methods and lays groundwork for extending PAC-Bayesian analyses to broader KGRL architectures, including attention-based models, with potential impact on practical KG completion systems.

Abstract

While a number of knowledge graph representation learning (KGRL) methods have been proposed over the past decade, very few theoretical analyses have been conducted on them. In this paper, we present the first PAC-Bayesian generalization bounds for KGRL methods. To analyze a broad class of KGRL models, we propose a generic framework named ReED (Relation-aware Encoder-Decoder), which consists of a relation-aware message passing encoder and a triplet classification decoder. Our ReED framework can express at least 15 different existing KGRL models, including not only graph neural network-based models such as R-GCN and CompGCN but also shallow-architecture models such as RotatE and ANALOGY. Our generalization bounds for the ReED framework provide theoretical grounds for the commonly used tricks in KGRL, e.g., parameter-sharing and weight normalization schemes, and guide desirable design choices for practical KGRL methods. We empirically show that the critical factors in our generalization bounds can explain actual generalization errors on three real-world knowledge graphs.

PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning

TL;DR

Abstract

Paper Structure (27 sections, 5 theorems, 81 equations, 3 figures, 4 tables)

This paper contains 27 sections, 5 theorems, 81 equations, 3 figures, 4 tables.

Introduction
Knowledge Graph Completion via Triplet Classification
ReED Framework for Knowledge Graph Representation Learning
Relation-Aware Message-Passing (RAMP) Encoder
Triplet Classification Decoder
Expressing Existing KGRL Methods Using ReED
Generalization Bounds for ReED
Transductive PAC-Bayesian Approach for Deterministic Triplet Classifiers
PAC-Bayesian Generalization Bounds for ReED
Experiments
Related Work and Discussion
Conclusion and Future Work
Basic Notation
Interpreting ReED as a Generalization of Existing KGRL Methods
Representing Existing KGRL Encoders Using RAMP Encoder
...and 12 more sections

Key Result

Theorem 4.3

Let $f_{{\bf{w}}}:\mathcal{V}\times\mathcal{R}\times\mathcal{V}\rightarrow\mathbb{R}^2$ be a deterministic triplet classifier with parameters ${\bf{w}}$, and $\mathcal{P}$ be any prior distribution on ${\bf{w}}$. Let us consider the finite full triplet set $\mathcal{E}\subseteq\mathcal{V}\times\math where $\mathcal{L}_{\gamma,\mathcal{\widehat{E}}}(f_{{\bf{w}}})$ is defined in Definition def:margi

Figures (3)

Figure 1: Using different instantiations and combinations of the RAMP encoder and the triplet classification decoder, ReED can express many existing KGRL methods.
Figure 2: Generalization Errors of ReED according to different aggregators, norms of the weight matrices, and numbers of layers in the RAMP encoder. In ReED, two different triplet classification decoders, TD or SM, are used. The changing trends in generalization errors according to the three different factors align with the theoretical findings in Corollary \ref{['cor:simp']}.
Figure 3: Generalization Errors of ReED on FB15K237 according to different maximum dimensions $d$.

Theorems & Definitions (13)

Definition 3.1: RAMP Encoder for KGRL
Definition 3.2: Translational Distance Decoder
Definition 3.3: Semantic Matching Decoder
Definition 4.1: $\gamma$-Margin Loss of Triplet Classifier
Definition 4.2: Classification Loss of Triplet Classifier
Theorem 4.3: Transductive PAC-Bayesian Generalization Bound for a Deterministic Triplet Classifier
Theorem 4.4: Generalization Bound for ReED with Translational Distance Decoder
Theorem 4.5: Generalization Bound for ReED with Semantic Matching Decoder
Corollary 4.6: Simplified Form of the Generalization Bounds for ReED
Lemma 3.1: pactrans pactrans, Corollary 7
...and 3 more

PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning

TL;DR

Abstract

PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (13)