PAC-Bayesian Generalization Bounds for Knowledge Graph Representation Learning
Jaejun Lee, Minsung Hwang, Joyce Jiyoung Whang
TL;DR
The paper tackles the lack of theoretical guarantees for knowledge graph representation learning by deriving the first PAC-Bayesian generalization bounds for KGRL. It introduces ReED, a flexible Relation-aware Encoder-Decoder framework with a RAMP encoder and two decoders (TD and SM) that can replicate a broad set of KGRL methods. The main contributions are the transductive PAC-Bayesian bound for deterministic triplet classifiers and the ensuing ReED-specific bounds that quantify how depth, parameter count, and weight norms affect generalization; simplified forms highlight the benefits of mean aggregators and parameter sharing. Empirically, the authors validate the theoretical factors on three real-world knowledge graphs, showing that mean aggregators, smaller parameter counts, and controlled norms align with reduced generalization gaps, and that the theory captures trends seen in practice. The work provides a principled design guide for KGRL methods and lays groundwork for extending PAC-Bayesian analyses to broader KGRL architectures, including attention-based models, with potential impact on practical KG completion systems.
Abstract
While a number of knowledge graph representation learning (KGRL) methods have been proposed over the past decade, very few theoretical analyses have been conducted on them. In this paper, we present the first PAC-Bayesian generalization bounds for KGRL methods. To analyze a broad class of KGRL models, we propose a generic framework named ReED (Relation-aware Encoder-Decoder), which consists of a relation-aware message passing encoder and a triplet classification decoder. Our ReED framework can express at least 15 different existing KGRL models, including not only graph neural network-based models such as R-GCN and CompGCN but also shallow-architecture models such as RotatE and ANALOGY. Our generalization bounds for the ReED framework provide theoretical grounds for the commonly used tricks in KGRL, e.g., parameter-sharing and weight normalization schemes, and guide desirable design choices for practical KGRL methods. We empirically show that the critical factors in our generalization bounds can explain actual generalization errors on three real-world knowledge graphs.
