Table of Contents
Fetching ...

Federated Graph Learning with Adaptive Importance-based Sampling

Anran Li, Yuanyuan Chen, Chao Ren, Wenhan Wang, Ming Hu, Tianlin Li, Han Yu, Qingyu Chen

TL;DR

The Federated Adaptive Importance-based Sampling (FedAIS) approach achieves substantial computational cost saving by focusing the limited resources on training important nodes, while reducing communication overhead via adaptive historical embedding synchronization.

Abstract

For privacy-preserving graph learning tasks involving distributed graph datasets, federated learning (FL)-based GCN (FedGCN) training is required. A key challenge for FedGCN is scaling to large-scale graphs, which typically incurs high computation and communication costs when dealing with the explosively increasing number of neighbors. Existing graph sampling-enhanced FedGCN training approaches ignore graph structural information or dynamics of optimization, resulting in high variance and inaccurate node embeddings. To address this limitation, we propose the Federated Adaptive Importance-based Sampling (FedAIS) approach. It achieves substantial computational cost saving by focusing the limited resources on training important nodes, while reducing communication overhead via adaptive historical embedding synchronization. The proposed adaptive importance-based sampling method jointly considers the graph structural heterogeneity and the optimization dynamics to achieve optimal trade-off between efficiency and accuracy. Extensive evaluations against five state-of-the-art baselines on five real-world graph datasets show that FedAIS achieves comparable or up to 3.23% higher test accuracy, while saving communication and computation costs by 91.77% and 85.59%.

Federated Graph Learning with Adaptive Importance-based Sampling

TL;DR

The Federated Adaptive Importance-based Sampling (FedAIS) approach achieves substantial computational cost saving by focusing the limited resources on training important nodes, while reducing communication overhead via adaptive historical embedding synchronization.

Abstract

For privacy-preserving graph learning tasks involving distributed graph datasets, federated learning (FL)-based GCN (FedGCN) training is required. A key challenge for FedGCN is scaling to large-scale graphs, which typically incurs high computation and communication costs when dealing with the explosively increasing number of neighbors. Existing graph sampling-enhanced FedGCN training approaches ignore graph structural information or dynamics of optimization, resulting in high variance and inaccurate node embeddings. To address this limitation, we propose the Federated Adaptive Importance-based Sampling (FedAIS) approach. It achieves substantial computational cost saving by focusing the limited resources on training important nodes, while reducing communication overhead via adaptive historical embedding synchronization. The proposed adaptive importance-based sampling method jointly considers the graph structural heterogeneity and the optimization dynamics to achieve optimal trade-off between efficiency and accuracy. Extensive evaluations against five state-of-the-art baselines on five real-world graph datasets show that FedAIS achieves comparable or up to 3.23% higher test accuracy, while saving communication and computation costs by 91.77% and 85.59%.
Paper Structure (18 sections, 3 theorems, 13 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 3 theorems, 13 equations, 7 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Under Assumption ass:Lipschitz1, if for all $v\in V_k$ and all $l\in \{1,2, \cdots, L-1\}$, the final output error of layer $L$ in training round $t\in [T]$ is bounded by:

Figures (7)

  • Figure 1: Test accuracy vs. communication costs.
  • Figure 2: System overview of FedAIS. $\textcircled{1}$-$\textcircled{2}$ cross-client neighbor embeddings, $\textcircled{3}$ local model, $\textcircled{4}$ global model $\theta_{t+1}$.
  • Figure 3: Accuracy scores with sizes of total communication cost for training different FedGCN models.
  • Figure 4: The total computation and communication costs for training various FedGCN models.
  • Figure 5: Model performance vs. various ablation baselines.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Theorem 1
  • Theorem 2
  • Theorem 3