Table of Contents
Fetching ...

Federated Graph Learning with Structure Proxy Alignment

Xingbo Fu, Zihan Chen, Binchi Zhang, Chen Chen, Jundong Li

TL;DR

This work addresses the challenge of data heterogeneity in Federated Graph Learning, particularly the biased neighboring information that harms minority-class node embeddings. It introduces FedSpray, a framework that combines personalized GNNs with a lightweight global feature-structure encoder and class-wise structure proxies to generate unbiased soft targets for regularizing local training via knowledge distillation. The method demonstrates consistent improvements over strong baselines across multiple real-world datasets, with notable gains for minority nodes, while offering privacy preservation, reduced communication, and lower computational costs. These results suggest that aligning global structure proxies in latent space can substantially enhance node classification in distributed graph settings. FedSpray thus provides a principled, scalable approach to robust FGL under non-IID and heterophilic conditions.

Abstract

Federated Graph Learning (FGL) aims to learn graph learning models over graph data distributed in multiple data owners, which has been applied in various applications such as social recommendation and financial fraud detection. Inherited from generic Federated Learning (FL), FGL similarly has the data heterogeneity issue where the label distribution may vary significantly for distributed graph data across clients. For instance, a client can have the majority of nodes from a class, while another client may have only a few nodes from the same class. This issue results in divergent local objectives and impairs FGL convergence for node-level tasks, especially for node classification. Moreover, FGL also encounters a unique challenge for the node classification task: the nodes from a minority class in a client are more likely to have biased neighboring information, which prevents FGL from learning expressive node embeddings with Graph Neural Networks (GNNs). To grapple with the challenge, we propose FedSpray, a novel FGL framework that learns local class-wise structure proxies in the latent space and aligns them to obtain global structure proxies in the server. Our goal is to obtain the aligned structure proxies that can serve as reliable, unbiased neighboring information for node classification. To achieve this, FedSpray trains a global feature-structure encoder and generates unbiased soft targets with structure proxies to regularize local training of GNN models in a personalized way. We conduct extensive experiments over four datasets, and experiment results validate the superiority of FedSpray compared with other baselines. Our code is available at https://github.com/xbfu/FedSpray.

Federated Graph Learning with Structure Proxy Alignment

TL;DR

This work addresses the challenge of data heterogeneity in Federated Graph Learning, particularly the biased neighboring information that harms minority-class node embeddings. It introduces FedSpray, a framework that combines personalized GNNs with a lightweight global feature-structure encoder and class-wise structure proxies to generate unbiased soft targets for regularizing local training via knowledge distillation. The method demonstrates consistent improvements over strong baselines across multiple real-world datasets, with notable gains for minority nodes, while offering privacy preservation, reduced communication, and lower computational costs. These results suggest that aligning global structure proxies in latent space can substantially enhance node classification in distributed graph settings. FedSpray thus provides a principled, scalable approach to robust FGL under non-IID and heterophilic conditions.

Abstract

Federated Graph Learning (FGL) aims to learn graph learning models over graph data distributed in multiple data owners, which has been applied in various applications such as social recommendation and financial fraud detection. Inherited from generic Federated Learning (FL), FGL similarly has the data heterogeneity issue where the label distribution may vary significantly for distributed graph data across clients. For instance, a client can have the majority of nodes from a class, while another client may have only a few nodes from the same class. This issue results in divergent local objectives and impairs FGL convergence for node-level tasks, especially for node classification. Moreover, FGL also encounters a unique challenge for the node classification task: the nodes from a minority class in a client are more likely to have biased neighboring information, which prevents FGL from learning expressive node embeddings with Graph Neural Networks (GNNs). To grapple with the challenge, we propose FedSpray, a novel FGL framework that learns local class-wise structure proxies in the latent space and aligns them to obtain global structure proxies in the server. Our goal is to obtain the aligned structure proxies that can serve as reliable, unbiased neighboring information for node classification. To achieve this, FedSpray trains a global feature-structure encoder and generates unbiased soft targets with structure proxies to regularize local training of GNN models in a personalized way. We conduct extensive experiments over four datasets, and experiment results validate the superiority of FedSpray compared with other baselines. Our code is available at https://github.com/xbfu/FedSpray.
Paper Structure (49 sections, 1 theorem, 15 equations, 5 figures, 5 tables, 1 algorithm)

This paper contains 49 sections, 1 theorem, 15 equations, 5 figures, 5 tables, 1 algorithm.

Key Result

proposition 1

Given a set of $K$ clients, each client $k$ owns a local graph $\mathcal{G}^{(k)}\sim\text{Gen}(\boldsymbol{\mu}_1, \boldsymbol{\mu}_2, p^{(k)}, q^{(k)})$, $dist = \frac{||\boldsymbol{\mu}_1 - \boldsymbol{\mu}_2||_2}{2}$, which is smaller than $dist' = \left(1 + \sum_{k=1}^{K}(1-q^{(k)})(p^{(k)}-\fr

Figures (5)

  • Figure 1: An example of a financial system including four banks. The four banks aim to jointly train a model for predicting a customer's occupation (i.e., Doctor or Teacher) orchestrated by a third-party company over their local data while keeping their private data locally.
  • Figure 2: Classification accuracy (%) of minority nodes in each client by training MLP and GNN via FedAvg over the PubMed dataset. Average accuracy for all nodes: 82.35% for MLP VS 87.06% for GNN.
  • Figure 3: (a) An overview of the proposed FedSpray. The backbone of FedSpray is personalized GNN models $f(\theta^{(k)})$. A global feature-structure encoder $g(\omega)$ with structure proxies $\textbf{S}$ is also employed in FedSpray to tackle underrepresented node embeddings caused by adverse neighboring information in FGL. (b) An illustration of the feature-structure encoder in FedSpray.
  • Figure 4: Classification accuracy (%) of FedSpray on all nodes and minority nodes in the test sets with different values of $\lambda_1$ over (a) PubMed and (b) WikiCS with GraphSAGE.
  • Figure 5: Classification accuracy (%) of FedSpray on all nodes and minority nodes in the test sets with different $d_s$ over (a) WikiCS and (b) Physics with GCN.

Theorems & Definitions (1)

  • proposition 1