Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens

Yichi Zhang; Zhuo Chen; Lingbing Guo; Wen Zhang; Huajun Chen

Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens

Yichi Zhang, Zhuo Chen, Lingbing Guo, Wen Zhang, Huajun Chen

TL;DR

A token-based foundation model (TOFU) is proposed for MMKGR, which exhibits strong generalization across different MMKGs, and employs a hierarchical fusion architecture with mixture-of-message mechanisms, aiming to process these tokens and obtain transferable features for MMKGR.

Abstract

Multi-modal knowledge graph reasoning (MMKGR) aims to predict the missing links by exploiting both graph structure information and multi-modal entity contents. Most existing works are designed for a transductive setting, which learns dataset-specific embeddings and struggles to generalize to new KGs. Recent knowledge graph foundation models (KGFMs) improve cross-KG transfer, but they mainly exploit structural patterns and ignore rich multi-modal signals. We address these gaps by proposing a token-based foundation model (TOFU) for MMKGR, which exhibits strong generalization across different MMKGs. TOFU discretizes structural, visual, and textual information into modality-specific tokens. TOFU then employs a hierarchical fusion architecture with mixture-of-message mechanisms, aiming to process these tokens and obtain transferable features for MMKGR. Experimental results on 17 transductive, inductive, and fully-inductive MMKGs show that TOFU consistently outperforms strong KGFM and MMKGR baselines, delivering strong performance on unseen MMKGs.

Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens

TL;DR

Abstract

Paper Structure (26 sections, 16 equations, 7 figures, 5 tables)

This paper contains 26 sections, 16 equations, 7 figures, 5 tables.

Introduction
Related Works
Preliminaries
Methodology
Transferable Multi-modal Tokens
Visual and Textual Tokens
Structural Token Modeling
Hierarchical Local Fusion
Structural Encoder
Multi-modal Encoder and Gated Fusion
Global Propagation with Mixture-of-Messages
Experiments
Experiment Settings
Main Experiments
Transductive MMKGR Experiments
...and 11 more sections

Figures (7)

Figure 1: A comparison between MMKGR/KGFM and TOFU, along with insights gained from current large language models.
Figure 2: An overview of our TOFU framework. TOFU first models each modality (structure/visual/textual) into discrete tokens and employs a hierarchical fusion architecture to obtain the transferable entity and relation features, which consists of a structural encoder, a multi-modal encoder, and a fusion gate. Finally, TOFU applies global aggregation with a mixture-of-message mechanism to obtain multi-source information from the MMKG to make the MMKGR prediction based on the query-informed entity representations.
Figure 3: Detailed MRR results on the MMKGs. We annotate three baselines and the TOFU's performance across zero-shot and fine-tuning settings for each dataset, sorted in ascending order.
Figure 4: Single-MMKG transfer experiments. A$\rightarrow$B denotes training on dataset A and testing on dataset B. The abbreviation for MMKG is as follows: YAGO15K(A), MKG-Y(B), MKG-W(C), WN18RR++(D), FB15K-237(E). We compare TOFU with two baselines under both zero-shot and fine-tuning settings.
Figure 5: Ablation study on module design, modality contribution, and message function selections. The ablation study is conducted on 5 MMKGs separately, including DB15K, MKG-W, MKG-Y, YAGO15K, and WN18RR++. Overall results represent the average of them. We report the MRR results in the figures.
...and 2 more figures

Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens

TL;DR

Abstract

Every Little Helps: Building Knowledge Graph Foundation Model with Fine-grained Transferable Multi-modal Tokens

Authors

TL;DR

Abstract

Table of Contents

Figures (7)