Table of Contents
Fetching ...

Subgraph Federated Learning for Local Generalization

Sungwon Kim, Yoonho Lee, Yunhak Oh, Namkyeong Lee, Sukwon Yun, Junseok Lee, Sein Kim, Carl Yang, Chanyoung Park

TL;DR

This work tackles local overfitting and unseen data in subgraph Federated Learning by introducing FedLoG, which condenses reliable cross-client knowledge into global synthetic data using head-degree and head-class signals. Local models learn with two prototypical branches, while a server-side aggregation builds global synthetic nodes that capture structural and class information, enabling local generalization through feature scaling and prompt-based data augmentation. The approach achieves strong performance on Seen Graphs and substantially improves generalization to Unseen Node, Missing Class, and New Client settings across five real-world datasets, with ablations confirming the necessity of Local Generalization and the benefits of head-knowledge condensation. FedLoG also analyzes privacy implications, showing that condensed synthetic data and gradient perturbations provide robustness against reconstruction attacks while preserving utility for generative cross-client learning. Overall, FedLoG offers a practical, privacy-conscious framework for robust subgraph-FL in dynamic graphs with evolving label distributions.

Abstract

Federated Learning (FL) on graphs enables collaborative model training to enhance performance without compromising the privacy of each client. However, existing methods often overlook the mutable nature of graph data, which frequently introduces new nodes and leads to shifts in label distribution. Since they focus solely on performing well on each client's local data, they are prone to overfitting to their local distributions (i.e., local overfitting), which hinders their ability to generalize to unseen data with diverse label distributions. In contrast, our proposed method, FedLoG, effectively tackles this issue by mitigating local overfitting. Our model generates global synthetic data by condensing the reliable information from each class representation and its structural information across clients. Using these synthetic data as a training set, we alleviate the local overfitting problem by adaptively generalizing the absent knowledge within each local dataset. This enhances the generalization capabilities of local models, enabling them to handle unseen data effectively. Our model outperforms baselines in our proposed experimental settings, which are designed to measure generalization power to unseen data in practical scenarios. Our code is available at https://github.com/sung-won-kim/FedLoG

Subgraph Federated Learning for Local Generalization

TL;DR

This work tackles local overfitting and unseen data in subgraph Federated Learning by introducing FedLoG, which condenses reliable cross-client knowledge into global synthetic data using head-degree and head-class signals. Local models learn with two prototypical branches, while a server-side aggregation builds global synthetic nodes that capture structural and class information, enabling local generalization through feature scaling and prompt-based data augmentation. The approach achieves strong performance on Seen Graphs and substantially improves generalization to Unseen Node, Missing Class, and New Client settings across five real-world datasets, with ablations confirming the necessity of Local Generalization and the benefits of head-knowledge condensation. FedLoG also analyzes privacy implications, showing that condensed synthetic data and gradient perturbations provide robustness against reconstruction attacks while preserving utility for generative cross-client learning. Overall, FedLoG offers a practical, privacy-conscious framework for robust subgraph-FL in dynamic graphs with evolving label distributions.

Abstract

Federated Learning (FL) on graphs enables collaborative model training to enhance performance without compromising the privacy of each client. However, existing methods often overlook the mutable nature of graph data, which frequently introduces new nodes and leads to shifts in label distribution. Since they focus solely on performing well on each client's local data, they are prone to overfitting to their local distributions (i.e., local overfitting), which hinders their ability to generalize to unseen data with diverse label distributions. In contrast, our proposed method, FedLoG, effectively tackles this issue by mitigating local overfitting. Our model generates global synthetic data by condensing the reliable information from each class representation and its structural information across clients. Using these synthetic data as a training set, we alleviate the local overfitting problem by adaptively generalizing the absent knowledge within each local dataset. This enhances the generalization capabilities of local models, enabling them to handle unseen data effectively. Our model outperforms baselines in our proposed experimental settings, which are designed to measure generalization power to unseen data in practical scenarios. Our code is available at https://github.com/sung-won-kim/FedLoG

Paper Structure

This paper contains 79 sections, 20 equations, 11 figures, 15 tables, 1 algorithm.

Figures (11)

  • Figure 1: Data Reliability Analysis(PubMed used).
  • Figure 2: Overview of FedLoG with 2 Clients and 3 Classes.
  • Figure 3: Impact of headness of class/degree for various scenarios (Amazon Clothing - 3 Clients).
  • Figure 4: Ablation Studies (CiteSeer - 3 Clients).
  • Figure 5: 2D PCA visualization of feature distributions for the same class in the CiteSeer dataset.
  • ...and 6 more figures