Table of Contents
Fetching ...

Flock: A Knowledge Graph Foundation Model via Learning on Random Walks

Jinwoo Kim, Xingyue Huang, Krzysztof Olejniczak, Kyungbin Min, Michael Bronstein, Seunghoon Hong, İsmail İlkan Ceylan

TL;DR

The paper tackles zero-shot link prediction on knowledge graphs containing unseen nodes and relation types. It introduces probabilistic node-relation equivariance and a novel KG foundation model, Flock, which replaces traditional message passing with query-conditioned random walks, an anonymization-based recording protocol, a sequence processor, and a consensus pooling step. The authors prove Flock is invariant in probability and a universal approximator for isomorphism-invariant link-level functions on KGs, and demonstrate its superiority on the Petals synthetic dataset and 54 KG benchmarks for both entity and relation prediction in zero-shot and finetuned settings. They also analyze scaling behavior with pretraining graph mix and ensemble size and show benefits of test-time adaptation of walk counts, highlighting practical impact for robust, scalable KG reasoning.

Abstract

We study the problem of zero-shot link prediction on knowledge graphs (KGs), which requires models to generalize over novel entities and novel relations. Knowledge graph foundation models (KGFMs) address this task by enforcing equivariance over both nodes and relations, learning from structural properties of nodes and relations, which are then transferable to novel graphs with similar structural properties. However, the conventional notion of deterministic equivariance imposes inherent limits on the expressive power of KGFMs, preventing them from distinguishing structurally similar but semantically distinct relations. To overcome this limitation, we introduce probabilistic node-relation equivariance, which preserves equivariance in distribution while incorporating a principled randomization to break symmetries during inference. Building on this principle, we present Flock, a KGFM that iteratively samples random walks, encodes them into sequences via a recording protocol, embeds them with a sequence model, and aggregates representations of nodes and relations via learned pooling. Crucially, Flock respects probabilistic node-relation equivariance and is a universal approximator for isomorphism-invariant link-level functions over KGs. Empirically, Flock perfectly solves our new diagnostic dataset Petals where current KGFMs fail, and achieves state-of-the-art performances on entity- and relation prediction tasks on 54 KGs from diverse domains.

Flock: A Knowledge Graph Foundation Model via Learning on Random Walks

TL;DR

The paper tackles zero-shot link prediction on knowledge graphs containing unseen nodes and relation types. It introduces probabilistic node-relation equivariance and a novel KG foundation model, Flock, which replaces traditional message passing with query-conditioned random walks, an anonymization-based recording protocol, a sequence processor, and a consensus pooling step. The authors prove Flock is invariant in probability and a universal approximator for isomorphism-invariant link-level functions on KGs, and demonstrate its superiority on the Petals synthetic dataset and 54 KG benchmarks for both entity and relation prediction in zero-shot and finetuned settings. They also analyze scaling behavior with pretraining graph mix and ensemble size and show benefits of test-time adaptation of walk counts, highlighting practical impact for robust, scalable KG reasoning.

Abstract

We study the problem of zero-shot link prediction on knowledge graphs (KGs), which requires models to generalize over novel entities and novel relations. Knowledge graph foundation models (KGFMs) address this task by enforcing equivariance over both nodes and relations, learning from structural properties of nodes and relations, which are then transferable to novel graphs with similar structural properties. However, the conventional notion of deterministic equivariance imposes inherent limits on the expressive power of KGFMs, preventing them from distinguishing structurally similar but semantically distinct relations. To overcome this limitation, we introduce probabilistic node-relation equivariance, which preserves equivariance in distribution while incorporating a principled randomization to break symmetries during inference. Building on this principle, we present Flock, a KGFM that iteratively samples random walks, encodes them into sequences via a recording protocol, embeds them with a sequence model, and aggregates representations of nodes and relations via learned pooling. Crucially, Flock respects probabilistic node-relation equivariance and is a universal approximator for isomorphism-invariant link-level functions over KGs. Empirically, Flock perfectly solves our new diagnostic dataset Petals where current KGFMs fail, and achieves state-of-the-art performances on entity- and relation prediction tasks on 54 KGs from diverse domains.

Paper Structure

This paper contains 37 sections, 12 theorems, 79 equations, 5 figures, 21 tables.

Key Result

Proposition 4.1

With a powerful enough sequence processor $f_\theta$, the $\textsc{Flock}$ framework described above is a universal approximator of link invariant functions over $\mathbb{K}_{n,m}$ for all pairs $(n,m)$.

Figures (5)

  • Figure 1: A KG representing characters' relationships in Star Wars movies. Blue arrows indicate $\mathsf{like}$, red arrows -- $\mathsf{dislike}$, and green arrows indicate relation ($\mathsf{friendWith}$).
  • Figure 2: Overall pipeline of $\textsc{Flock}$. In each updating step, $\textsc{Flock}$ samples random walks on the KG, anonymizes the encountered nodes and relations via a recording protocol, and feeds the sequences in a sequence processor to compute node and relation representations. A consensus protocol then pools them back to the original KG’s nodes and relations.
  • Figure 3: Example KG from Petals. KGFMs with relational invariants must equate blue $r_1$ and red $r_2$, thus predicting the same scores for both dashed queries with $r_0$.
  • Figure 4: Pretraining and test-time scaling of $\textsc{Flock}$ on 41 inductive KG datasets.
  • Figure 5: An example of a graph from Petals with $c=4$, $l=2$ and $t=3$, and the associated link prediction instances (dashed). The relation types 'red', 'blue', 'pink' and 'yellow' are structurally isomorphic, hence become equated in the eyes of the existing KGFMs.

Theorems & Definitions (24)

  • Proposition 4.1
  • Proposition 4.2
  • Proposition 4.3
  • Lemma C.1
  • proof
  • Remark C.2
  • Lemma C.3
  • proof
  • Lemma C.4
  • proof
  • ...and 14 more