Subgraph Federated Learning for Local Generalization
Sungwon Kim, Yoonho Lee, Yunhak Oh, Namkyeong Lee, Sukwon Yun, Junseok Lee, Sein Kim, Carl Yang, Chanyoung Park
TL;DR
This work tackles local overfitting and unseen data in subgraph Federated Learning by introducing FedLoG, which condenses reliable cross-client knowledge into global synthetic data using head-degree and head-class signals. Local models learn with two prototypical branches, while a server-side aggregation builds global synthetic nodes that capture structural and class information, enabling local generalization through feature scaling and prompt-based data augmentation. The approach achieves strong performance on Seen Graphs and substantially improves generalization to Unseen Node, Missing Class, and New Client settings across five real-world datasets, with ablations confirming the necessity of Local Generalization and the benefits of head-knowledge condensation. FedLoG also analyzes privacy implications, showing that condensed synthetic data and gradient perturbations provide robustness against reconstruction attacks while preserving utility for generative cross-client learning. Overall, FedLoG offers a practical, privacy-conscious framework for robust subgraph-FL in dynamic graphs with evolving label distributions.
Abstract
Federated Learning (FL) on graphs enables collaborative model training to enhance performance without compromising the privacy of each client. However, existing methods often overlook the mutable nature of graph data, which frequently introduces new nodes and leads to shifts in label distribution. Since they focus solely on performing well on each client's local data, they are prone to overfitting to their local distributions (i.e., local overfitting), which hinders their ability to generalize to unseen data with diverse label distributions. In contrast, our proposed method, FedLoG, effectively tackles this issue by mitigating local overfitting. Our model generates global synthetic data by condensing the reliable information from each class representation and its structural information across clients. Using these synthetic data as a training set, we alleviate the local overfitting problem by adaptively generalizing the absent knowledge within each local dataset. This enhances the generalization capabilities of local models, enabling them to handle unseen data effectively. Our model outperforms baselines in our proposed experimental settings, which are designed to measure generalization power to unseen data in practical scenarios. Our code is available at https://github.com/sung-won-kim/FedLoG
