Table of Contents
Fetching ...

FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks

Siddharth Ambekar, Yuhang Yao, Ryan Li, Carlee Joe-Wong

TL;DR

This work tackles privacy-preserving federated training of Graph Attention Networks on graphs partitioned across clients with cross-client edges. It introduces FedGAT, which uses a one-round pre-training communication to compute a Chebyshev-polynomial approximation of the GAT update, enabling federated updates without sharing raw features and with formal approximation and privacy considerations. The method provides bounds on the approximation error, analyzes communication and computational costs, and demonstrates that FedGAT achieves accuracy close to a centrally trained GAT while remaining robust to the number of clients and data distribution. Experimental results on standard benchmarks show FedGAT outperforms baseline federated methods that drop cross-client edges and scales favorably in sparse graphs, indicating practical impact for privacy-preserving graph learning.

Abstract

Federated training methods have gained popularity for graph learning with applications including friendship graphs of social media sites and customer-merchant interaction graphs of huge online marketplaces. However, privacy regulations often require locally generated data to be stored on local clients. The graph is then naturally partitioned across clients, with no client permitted access to information stored on another. Cross-client edges arise naturally in such cases and present an interesting challenge to federated training methods, as training a graph model at one client requires feature information of nodes on the other end of cross-client edges. Attempting to retain such edges often incurs significant communication overhead, and dropping them altogether reduces model performance. In simpler models such as Graph Convolutional Networks, this can be fixed by communicating a limited amount of feature information across clients before training, but GATs (Graph Attention Networks) require additional information that cannot be pre-communicated, as it changes from training round to round. We introduce the Federated Graph Attention Network (FedGAT) algorithm for semi-supervised node classification, which approximates the behavior of GATs with provable bounds on the approximation error. FedGAT requires only one pre-training communication round, significantly reducing the communication overhead for federated GAT training. We then analyze the error in the approximation and examine the communication overhead and computational complexity of the algorithm. Experiments show that FedGAT achieves nearly the same accuracy as a GAT model in a centralised setting, and its performance is robust to the number of clients as well as data distribution.

FedGAT: A Privacy-Preserving Federated Approximation Algorithm for Graph Attention Networks

TL;DR

This work tackles privacy-preserving federated training of Graph Attention Networks on graphs partitioned across clients with cross-client edges. It introduces FedGAT, which uses a one-round pre-training communication to compute a Chebyshev-polynomial approximation of the GAT update, enabling federated updates without sharing raw features and with formal approximation and privacy considerations. The method provides bounds on the approximation error, analyzes communication and computational costs, and demonstrates that FedGAT achieves accuracy close to a centrally trained GAT while remaining robust to the number of clients and data distribution. Experimental results on standard benchmarks show FedGAT outperforms baseline federated methods that drop cross-client edges and scales favorably in sparse graphs, indicating practical impact for privacy-preserving graph learning.

Abstract

Federated training methods have gained popularity for graph learning with applications including friendship graphs of social media sites and customer-merchant interaction graphs of huge online marketplaces. However, privacy regulations often require locally generated data to be stored on local clients. The graph is then naturally partitioned across clients, with no client permitted access to information stored on another. Cross-client edges arise naturally in such cases and present an interesting challenge to federated training methods, as training a graph model at one client requires feature information of nodes on the other end of cross-client edges. Attempting to retain such edges often incurs significant communication overhead, and dropping them altogether reduces model performance. In simpler models such as Graph Convolutional Networks, this can be fixed by communicating a limited amount of feature information across clients before training, but GATs (Graph Attention Networks) require additional information that cannot be pre-communicated, as it changes from training round to round. We introduce the Federated Graph Attention Network (FedGAT) algorithm for semi-supervised node classification, which approximates the behavior of GATs with provable bounds on the approximation error. FedGAT requires only one pre-training communication round, significantly reducing the communication overhead for federated GAT training. We then analyze the error in the approximation and examine the communication overhead and computational complexity of the algorithm. Experiments show that FedGAT achieves nearly the same accuracy as a GAT model in a centralised setting, and its performance is robust to the number of clients as well as data distribution.

Paper Structure

This paper contains 34 sections, 6 theorems, 84 equations, 8 figures, 2 tables, 2 algorithms.

Key Result

Theorem 1

For an $L$-layer GAT, the cost at the server of computing the pre-training communication values is $\mathcal{O}\left( KB_{L}d\left( B^{2} + B^{3} \right) \right)$. The resulting communication overhead is $\mathcal{O}\left( KB_{L}dB^{3} \right)$.

Figures (8)

  • Figure 1: The central model parameters are denoted by $\mathcal{W}$. On receiving a copy of $\mathcal{W}$, clients compute local updates and send updates to the server, where they are aggregated.
  • Figure 2: Test accuracy v/s number of clients for iid and non-iid data distribution on the Cora (a,b) and Citeseer (c,d) dataset. FedGAT outperforms FedGCN and DistGAT.
  • Figure 3: The FedGAT pre-training communication increases with clients due to an increase in cross-client edges, leading to a larger sub-graph on each client. The cost is higher for an iid distribution due to increased crossing edges.
  • Figure 4: Pretrain communication cost for the Cora dataset. Here, the number of clients ranges from 20 to 100. Once again, we observe a near linear increase in the cost with the number of clients.
  • Figure 5: Accuracy v/s degree of approximation for iid, partial iid and non-iid data distribution on Cora. The performance of DistGAT is shown only for reference; it does not involve approximations.
  • ...and 3 more figures

Theorems & Definitions (10)

  • Theorem 1: FedGAT overhead
  • Theorem 2: Chebyshev approximation
  • Theorem 3: Error in attention coefficients
  • Theorem 4: Error in layer 1 embeddings
  • Theorem 5: Error propagation across layers
  • Claim 1
  • Claim 2
  • Claim 3
  • Lemma 1
  • Claim 4