Table of Contents
Fetching ...

FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling

Zhengyu Wu, Yinlin Zhu, Xunkai Li, Ziang Qiu, Rong-Hua Li, Guoren Wang, Chenghu Zhou

TL;DR

FedBook tackles privacy-preserving, cross-domain graph pre-training by learning a unified global codebook within the FedGFM framework. It introduces two phases: Phase 1 (Intra-domain Collaboration) refines low-frequency tokens by referencing high-frequency tokens through token-level similarities $S_{i,j}^{a,b}$ and a personalized aggregation of non-codebook parameters, while Phase 2 (Inter-domain Integration) weights contributions by domain distinctiveness $ abla^a$ to preserve cross-domain heterogeneity via the global codebook $oldsymbol{ ilde{W}}^g$. Empirically, FedBook outperforms 21 baselines across 8 benchmarks spanning node, edge, and graph tasks, with robust ablations, sensitivity analyses, and few-shot results supporting the effectiveness of the dual-phase aggregation. The work demonstrates practical, privacy-preserving benefits for federated graph foundation models and provides a concrete, scalable approach to unify multi-domain knowledge in decentralized settings, enabling stronger generalization with faster convergence.

Abstract

Foundation models have shown remarkable cross-domain generalization in language and vision, inspiring the development of graph foundation models (GFMs). However, existing GFMs typically assume centralized access to multi-domain graphs, which is often infeasible due to privacy and institutional constraints. Federated Graph Foundation Models (FedGFMs) address this limitation, but their effectiveness fundamentally hinges on constructing a robust global codebook that achieves intra-domain coherence by consolidating mutually reinforcing semantics within each domain, while also maintaining inter-domain diversity by retaining heterogeneous knowledge across domains. To this end, we propose FedBook, a unified federated graph foundation codebook that systematically aggregates clients' local codebooks during server-side federated pre-training. FedBook follows a two-phase process: (1) Intra-domain Collaboration, where low-frequency tokens are refined by referencing more semantically reliable high-frequency tokens across clients to enhance domain-specific coherence; and (2) Inter-domain Integration, where client contributions are weighted by the semantic distinctiveness of their codebooks during the aggregation of the global GFM, thereby preserving cross-domain diversity. Extensive experiments on 8 benchmarks across multiple domains and tasks demonstrate that FedBook consistently outperforms 21 baselines, including isolated supervised learning, FL/FGL, federated adaptations of centralized GFMs, and FedGFM techniques.

FedBook: A Unified Federated Graph Foundation Codebook with Intra-domain and Inter-domain Knowledge Modeling

TL;DR

FedBook tackles privacy-preserving, cross-domain graph pre-training by learning a unified global codebook within the FedGFM framework. It introduces two phases: Phase 1 (Intra-domain Collaboration) refines low-frequency tokens by referencing high-frequency tokens through token-level similarities and a personalized aggregation of non-codebook parameters, while Phase 2 (Inter-domain Integration) weights contributions by domain distinctiveness to preserve cross-domain heterogeneity via the global codebook . Empirically, FedBook outperforms 21 baselines across 8 benchmarks spanning node, edge, and graph tasks, with robust ablations, sensitivity analyses, and few-shot results supporting the effectiveness of the dual-phase aggregation. The work demonstrates practical, privacy-preserving benefits for federated graph foundation models and provides a concrete, scalable approach to unify multi-domain knowledge in decentralized settings, enabling stronger generalization with faster convergence.

Abstract

Foundation models have shown remarkable cross-domain generalization in language and vision, inspiring the development of graph foundation models (GFMs). However, existing GFMs typically assume centralized access to multi-domain graphs, which is often infeasible due to privacy and institutional constraints. Federated Graph Foundation Models (FedGFMs) address this limitation, but their effectiveness fundamentally hinges on constructing a robust global codebook that achieves intra-domain coherence by consolidating mutually reinforcing semantics within each domain, while also maintaining inter-domain diversity by retaining heterogeneous knowledge across domains. To this end, we propose FedBook, a unified federated graph foundation codebook that systematically aggregates clients' local codebooks during server-side federated pre-training. FedBook follows a two-phase process: (1) Intra-domain Collaboration, where low-frequency tokens are refined by referencing more semantically reliable high-frequency tokens across clients to enhance domain-specific coherence; and (2) Inter-domain Integration, where client contributions are weighted by the semantic distinctiveness of their codebooks during the aggregation of the global GFM, thereby preserving cross-domain diversity. Extensive experiments on 8 benchmarks across multiple domains and tasks demonstrate that FedBook consistently outperforms 21 baselines, including isolated supervised learning, FL/FGL, federated adaptations of centralized GFMs, and FedGFM techniques.

Paper Structure

This paper contains 20 sections, 11 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: Illustration of the FedGFM paradigm, including Federated Pre-Training and Task-Specific Fine-Tuning phases.
  • Figure 2: Overview of the proposed FedBook framework under the FedGFM paradigm. In each communication round, clients perform local pre-training with their gVQ-MAE on local graph data, while the server sequentially conducts intra-domain collaboration followed by inter-domain integration to preserve intra-domain coherence and inter-domain diversity, ultimately yielding an effective global GFM.
  • Figure 3: Sensitivity analysis for the trade-off parameter $\lambda$.
  • Figure 4: Sensitivity analysis of the codebook architecture with respect to the number of heads $H$ and learnable tokens per head $N$.
  • Figure 5: Convergence curves comparing FedBook and FedGFM+, demonstrating the efficiency advantage of FedBook.