FedGT: Federated Node Classification with Scalable Graph Transformer
Zaixi Zhang, Qingyong Hu, Yang Yu, Weibo Gao, Qi Liu
TL;DR
The paper tackles node classification in subgraph federated learning under privacy constraints, where cross-subgraph links are missing and data distributions across clients are heterogeneous. It introduces FedGT, a scalable Graph Transformer that uses a hybrid local-global attention scheme with $n_s$ sampled neighbors and $n_g$ curated global nodes, achieving a linear-time complexity of $O(n(n_g+n_s))$ per forward pass. Global nodes are updated online via clustering, and client similarity for personalization is computed with optimal transport alignment of these nodes, enabling weighted, per-client aggregation; local differential privacy is applied to protect shared information. Theoretical analysis provides a bound on the approximation error of global attention, and extensive experiments on six datasets under two subgraph settings show that FedGT achieves state-of-the-art performance while effectively handling missing links and data heterogeneity, demonstrating practical impact for privacy-preserving, scalable graph learning in distributed environments.
Abstract
Graphs are widely used to model relational data. As graphs are getting larger and larger in real-world scenarios, there is a trend to store and compute subgraphs in multiple local systems. For example, recently proposed \emph{subgraph federated learning} methods train Graph Neural Networks (GNNs) distributively on local subgraphs and aggregate GNN parameters with a central server. However, existing methods have the following limitations: (1) The links between local subgraphs are missing in subgraph federated learning. This could severely damage the performance of GNNs that follow message-passing paradigms to update node/edge features. (2) Most existing methods overlook the subgraph heterogeneity issue, brought by subgraphs being from different parts of the whole graph. To address the aforementioned challenges, we propose a scalable \textbf{Fed}erated \textbf{G}raph \textbf{T}ransformer (\textbf{FedGT}) in the paper. Firstly, we design a hybrid attention scheme to reduce the complexity of the Graph Transformer to linear while ensuring a global receptive field with theoretical bounds. Specifically, each node attends to the sampled local neighbors and a set of curated global nodes to learn both local and global information and be robust to missing links. The global nodes are dynamically updated during training with an online clustering algorithm to capture the data distribution of the corresponding local subgraph. Secondly, FedGT computes clients' similarity based on the aligned global nodes with optimal transport. The similarity is then used to perform weighted averaging for personalized aggregation, which well addresses the data heterogeneity problem. Moreover, local differential privacy is applied to further protect the privacy of clients. Finally, extensive experimental results on 6 datasets and 2 subgraph settings demonstrate the superiority of FedGT.
