Table of Contents
Fetching ...

Multi-domain Knowledge Graph Collaborative Pre-training and Prompt Tuning for Diverse Downstream Tasks

Yichi Zhang, Binbin Hu, Zhuo Chen, Lingbing Guo, Ziqi Liu, Zhiqiang Zhang, Lei Liang, Huajun Chen, Wen Zhang

TL;DR

This work tackles the inefficiencies and limited transferability of knowledge graph pre-training by introducing Mudok, a framework that combines collaborative pre-training over multi-domain KGs with lightweight prefix prompt tuning to support diverse downstream tasks. It defines a new KPI benchmark built from open-source data to enable reproducible evaluation across domains and tasks such as recommendation and text understanding. Mudok’s CoPT learns robust, domain-aware representations while PPT provides a unified, parameter-efficient interface that can be plugged into various task backbones. Experimental results demonstrate consistent improvements over baselines across multiple domains and NLP backbones, highlighting practical impact for real-world KG-based systems and providing a open-source benchmark for future research.

Abstract

Knowledge graphs (KGs) provide reliable external knowledge for a wide variety of AI tasks in the form of structured triples. Knowledge graph pre-training (KGP) aims to pre-train neural networks on large-scale KGs and provide unified interfaces to enhance different downstream tasks, which is a key direction for KG management, maintenance, and applications. Existing works often focus on purely research questions in open domains, or they are not open source due to data security and privacy in real scenarios. Meanwhile, existing studies have not explored the training efficiency and transferability of KGP models in depth. To address these problems, We propose a framework MuDoK to achieve multi-domain collaborative pre-training and efficient prefix prompt tuning to serve diverse downstream tasks like recommendation and text understanding. Our design is a plug-and-play prompt learning approach that can be flexibly adapted to different downstream task backbones. In response to the lack of open-source benchmarks, we constructed a new multi-domain KGP benchmark called KPI with two large-scale KGs and six different sub-domain tasks to evaluate our method and open-sourced it for subsequent research. We evaluated our approach based on constructed KPI benchmarks using diverse backbone models in heterogeneous downstream tasks. The experimental results show that our framework brings significant performance gains, along with its generality, efficiency, and transferability.

Multi-domain Knowledge Graph Collaborative Pre-training and Prompt Tuning for Diverse Downstream Tasks

TL;DR

This work tackles the inefficiencies and limited transferability of knowledge graph pre-training by introducing Mudok, a framework that combines collaborative pre-training over multi-domain KGs with lightweight prefix prompt tuning to support diverse downstream tasks. It defines a new KPI benchmark built from open-source data to enable reproducible evaluation across domains and tasks such as recommendation and text understanding. Mudok’s CoPT learns robust, domain-aware representations while PPT provides a unified, parameter-efficient interface that can be plugged into various task backbones. Experimental results demonstrate consistent improvements over baselines across multiple domains and NLP backbones, highlighting practical impact for real-world KG-based systems and providing a open-source benchmark for future research.

Abstract

Knowledge graphs (KGs) provide reliable external knowledge for a wide variety of AI tasks in the form of structured triples. Knowledge graph pre-training (KGP) aims to pre-train neural networks on large-scale KGs and provide unified interfaces to enhance different downstream tasks, which is a key direction for KG management, maintenance, and applications. Existing works often focus on purely research questions in open domains, or they are not open source due to data security and privacy in real scenarios. Meanwhile, existing studies have not explored the training efficiency and transferability of KGP models in depth. To address these problems, We propose a framework MuDoK to achieve multi-domain collaborative pre-training and efficient prefix prompt tuning to serve diverse downstream tasks like recommendation and text understanding. Our design is a plug-and-play prompt learning approach that can be flexibly adapted to different downstream task backbones. In response to the lack of open-source benchmarks, we constructed a new multi-domain KGP benchmark called KPI with two large-scale KGs and six different sub-domain tasks to evaluate our method and open-sourced it for subsequent research. We evaluated our approach based on constructed KPI benchmarks using diverse backbone models in heterogeneous downstream tasks. The experimental results show that our framework brings significant performance gains, along with its generality, efficiency, and transferability.
Paper Structure (27 sections, 14 equations, 3 figures, 6 tables)

This paper contains 27 sections, 14 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overview of our proposed framework Mudok. Mudok consists of a collaborative pre-training stage and a prefix prompt tuning stage, which first pre-trains on the large-scale multi-domain item KGs and fine-tuned on the item-aware downstream tasks like recommendation and text understandings with a lightweight prefix prompt token.
  • Figure 2: The ablation study results on the three domains of Amazon. We design five groups of experiments to validate the effectiveness of our design in Mudok. G1: Full Model; G2: w/o PPT; G3: w/o CoPT; G4: w/o $\mathcal{L}_{con}$; G5: w/o $\mathcal{L}_{kg}$.
  • Figure 3: The efficiency analysis of Mudok. We report the training time on several recommendation backbones.