Table of Contents
Fetching ...

Towards Foundation Models for Knowledge Graph Reasoning

Mikhail Galkin, Xinyu Yuan, Hesham Mostafa, Jian Tang, Zhaocheng Zhu

TL;DR

This work tackles the challenge of transferring knowledge graph reasoning across graphs with arbitrary entity and relation vocabularies by proposing ULTRA, a foundation-model-like approach. ULTRA constructs a relation graph and learns conditional relation representations that hinge on interaction patterns, enabling zero-shot generalization to unseen graphs and fine-tuning for downstream tasks. Empirical results on 57 diverse KGs show that a single pre-trained ULTRA model often matches or exceeds state-of-the-art supervised baselines in zero-shot inference, with fine-tuning delivering further gains (average improvements around 10% in $MRR$). The approach demonstrates the potential of transferable, feature-free graph representations for KG reasoning and paves the way for more scalable, cross-graph knowledge integration in domains ranging from biology to culture.

Abstract

Foundation models in language and vision have the ability to run inference on any textual and visual inputs thanks to the transferable representations such as a vocabulary of tokens in language. Knowledge graphs (KGs) have different entity and relation vocabularies that generally do not overlap. The key challenge of designing foundation models on KGs is to learn such transferable representations that enable inference on any graph with arbitrary entity and relation vocabularies. In this work, we make a step towards such foundation models and present ULTRA, an approach for learning universal and transferable graph representations. ULTRA builds relational representations as a function conditioned on their interactions. Such a conditioning strategy allows a pre-trained ULTRA model to inductively generalize to any unseen KG with any relation vocabulary and to be fine-tuned on any graph. Conducting link prediction experiments on 57 different KGs, we find that the zero-shot inductive inference performance of a single pre-trained ULTRA model on unseen graphs of various sizes is often on par or better than strong baselines trained on specific graphs. Fine-tuning further boosts the performance.

Towards Foundation Models for Knowledge Graph Reasoning

TL;DR

This work tackles the challenge of transferring knowledge graph reasoning across graphs with arbitrary entity and relation vocabularies by proposing ULTRA, a foundation-model-like approach. ULTRA constructs a relation graph and learns conditional relation representations that hinge on interaction patterns, enabling zero-shot generalization to unseen graphs and fine-tuning for downstream tasks. Empirical results on 57 diverse KGs show that a single pre-trained ULTRA model often matches or exceeds state-of-the-art supervised baselines in zero-shot inference, with fine-tuning delivering further gains (average improvements around 10% in ). The approach demonstrates the potential of transferable, feature-free graph representations for KG reasoning and paves the way for more scalable, cross-graph knowledge integration in domains ranging from biology to culture.

Abstract

Foundation models in language and vision have the ability to run inference on any textual and visual inputs thanks to the transferable representations such as a vocabulary of tokens in language. Knowledge graphs (KGs) have different entity and relation vocabularies that generally do not overlap. The key challenge of designing foundation models on KGs is to learn such transferable representations that enable inference on any graph with arbitrary entity and relation vocabularies. In this work, we make a step towards such foundation models and present ULTRA, an approach for learning universal and transferable graph representations. ULTRA builds relational representations as a function conditioned on their interactions. Such a conditioning strategy allows a pre-trained ULTRA model to inductively generalize to any unseen KG with any relation vocabulary and to be fine-tuned on any graph. Conducting link prediction experiments on 57 different KGs, we find that the zero-shot inductive inference performance of a single pre-trained ULTRA model on unseen graphs of various sizes is often on par or better than strong baselines trained on specific graphs. Fine-tuning further boosts the performance.
Paper Structure (26 sections, 4 equations, 6 figures, 11 tables)

This paper contains 26 sections, 4 equations, 6 figures, 11 tables.

Figures (6)

  • Figure 1: Zero-shot and fine-tuned MRR (higher is better) of Ultra pre-trained on three graphs (FB15k-237, WN18RR, CoDEx-Medium). On average, zero-shot performance is better than best reported baselines trained on specific graphs (0.395 vs 0.344). More results in Figure \ref{['fig:mtdea']} and Table \ref{['tab:main1']}.
  • Figure 2: (a) relative entity representations used in inductive models generalize to new entities; (b) relative relation representations based on a graph of relations generalize to both new relations and entities. The graph of relations captures four fundamental interactions (t2h, h2h, h2t, h2h) independent from any graph-specific relation vocabulary and whose representations can be learned.
  • Figure 3: Given a query $(h,q,?)$ on graph ${\mathcal{G}}$, Ultra (1) builds a graph of relations $\mathcal{G}_r$ with four interactions ${\mathcal{R}}_{\textit{fund}}$ (Sec. \ref{['subsec:rel_graph']}); (2) builds relation representations ${\bm{R}}_q$ conditioned on the query relation $q$ and $\mathcal{G}_r$ (Sec. \ref{['subsec:rel_representations']}); (3) runs any inductive link predictor on ${\mathcal{G}}$ using representations ${\bm{R}}_q$ (Sec. \ref{['subsec:inductive_lp']}).
  • Figure 4: Ultra performance on 14 inductive datasets from MTDEA mtdea and INDIGO indigo for 8 of which only an approximate metric Hits@10 (50 negs) is available (center). We also report full MRR (left) and Hits@10 (right) computed on the entire entity sets demonstrating that Hits@10 (50 negs) overestimates the real performance.
  • Figure 5: Comparison of zero-shot and fine-tuned Ultra per-dataset performance against training a model from scratch on each dataset (Train e2e). Zero-shot performance of a single pre-trained model is on par with training from scratch while fine-tuning yields overall best results.
  • ...and 1 more figures