Table of Contents
Fetching ...

Federated Learning with Limited Node Labels

Bisheng Tang, Xiaojun Chen, Shaopu Wang, Yuexin Xuan, Zhendong Zhao

TL;DR

This work tackles node classification under federated learning with limited labels by introducing FedMpa, a two-stage SFL framework that first learns cross-subgraph global features via a federated MLP (FedMLP) and then diffuses information through a local APPNP-inspired process. To address missing cross-subgraph edges, FedMpae complements FedMpa by reconstructing local structures through pooling to form super-nodes and training with a graph autoencoder-based reconstruction loss, enabling efficient cross-subgraph propagation without generating abundant extra nodes. Across six graph datasets, FedMpa and FedMpae demonstrate competitive or superior performance to state-of-the-art FedSage variants in low-label scenarios, with ablations validating each component. The approach offers practical benefits in privacy-preserving graph learning, reducing labeling requirements while improving online calculation efficiency and scalability for real-world distributed graph data.

Abstract

Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-pass global representations to other parties' GNNs. Moreover, existing SFL models require substantial labeled data, which limits their practical applications. To overcome these limitations, we present a novel SFL framework called FedMpa that aims to learn cross-subgraph node representations. FedMpa first trains a multilayer perceptron (MLP) model using a small amount of data and then propagates the federated feature to the local structures. To further improve the embedding representation of nodes with local subgraphs, we introduce the FedMpae method, which reconstructs the local graph structure with an innovation view that applies pooling operation to form super-nodes. Our extensive experiments on six graph datasets demonstrate that FedMpa is highly effective in node classification. Furthermore, our ablation experiments verify the effectiveness of FedMpa.

Federated Learning with Limited Node Labels

TL;DR

This work tackles node classification under federated learning with limited labels by introducing FedMpa, a two-stage SFL framework that first learns cross-subgraph global features via a federated MLP (FedMLP) and then diffuses information through a local APPNP-inspired process. To address missing cross-subgraph edges, FedMpae complements FedMpa by reconstructing local structures through pooling to form super-nodes and training with a graph autoencoder-based reconstruction loss, enabling efficient cross-subgraph propagation without generating abundant extra nodes. Across six graph datasets, FedMpa and FedMpae demonstrate competitive or superior performance to state-of-the-art FedSage variants in low-label scenarios, with ablations validating each component. The approach offers practical benefits in privacy-preserving graph learning, reducing labeling requirements while improving online calculation efficiency and scalability for real-world distributed graph data.

Abstract

Subgraph federated learning (SFL) is a research methodology that has gained significant attention for its potential to handle distributed graph-structured data. In SFL, the local model comprises graph neural networks (GNNs) with a partial graph structure. However, some SFL models have overlooked the significance of missing cross-subgraph edges, which can lead to local GNNs being unable to message-pass global representations to other parties' GNNs. Moreover, existing SFL models require substantial labeled data, which limits their practical applications. To overcome these limitations, we present a novel SFL framework called FedMpa that aims to learn cross-subgraph node representations. FedMpa first trains a multilayer perceptron (MLP) model using a small amount of data and then propagates the federated feature to the local structures. To further improve the embedding representation of nodes with local subgraphs, we introduce the FedMpae method, which reconstructs the local graph structure with an innovation view that applies pooling operation to form super-nodes. Our extensive experiments on six graph datasets demonstrate that FedMpa is highly effective in node classification. Furthermore, our ablation experiments verify the effectiveness of FedMpa.
Paper Structure (25 sections, 10 equations, 6 figures, 4 tables, 2 algorithms)

This paper contains 25 sections, 10 equations, 6 figures, 4 tables, 2 algorithms.

Figures (6)

  • Figure 1: The upper half of the FedMpa framework outlines the process. First, we use FedAvg to train the local feature and set the local model as MLP. After obtaining the FedMLP model, we gain the global information of each local graph. With the learned feature, we can conduct local message passing with approximate PageRank, which we term FedMpa. To handle the cross-subgraph edges, we use graph augmentation to reconstruct the discarded edges, as shown in Figure \ref{['figure2']} right part, which we term FedMpae. The bottom half of the framework includes the FedMLP, FedMpa, and FedMpae modules, which detail the learning process and node classification task, including the cross entropy loss $\mathcal{L}_{ce}$ and the local model. After FedMLP has finished, the parameter $W$ is separately transmitted to FedMpa and FedMpae.
  • Figure 2: Left: Generate the missing neighbor node in subgraph with global feature. Right: Repair the missing link or link weight in subgraph with global feature. These two figures have presented two approaches for repairing the subgraph with a global view. On the left, the orange nodes and corresponding dashed edges do not exist in the local client, so costly calculations are required to simulate the generation of the missing nodes and edges across the subgraph. On the right, instead, we combine the missing neighbors (i.e., orange nodes) with edges to form a new single entity (i.e., a new super-node), whose augmented node feature needs to learn from the federated paradigm, and the corresponding edge weights (dashed lines) are also made learnable.
  • Figure 3: This table illustrates the impact of the dropout rate on node classification for Cora, Coauthor-cs, and Computer with LocMpa, FedMpa, and FedMpae, respectively.
  • Figure 4: This table illustrates the impact of FedMpae hyper-parameters $\beta$ and $\gamma$ on node classification for Cora (left) and Computer (right). The dark green plane is the accuracy of FedSage+.
  • Figure 5: This table illustrates the impact of label rate on node classification for Citeseer. As is evident from the figure, the accuracy of both FedSage and FedSage+ increases with an increase in label rate. Moreover, FedMpa and FedMpae have a pronounced advantage when the label rate is low.
  • ...and 1 more figures