Table of Contents
Fetching ...

Deep Efficient Private Neighbor Generation for Subgraph Federated Learning

Ke Zhang, Lichao Sun, Bolin Ding, Siu Ming Yiu, Carl Yang

TL;DR

This paper tackles subgraph federated learning where cross-subgraph neighbors are incomplete due to data fragmentation, hindering GNN training. It introduces FedDEP, a framework comprising Deep Neighbor Generation (DGen) for multi-hop missing-neighbor embeddings, Efficient Pseudo-FL with prototypes (Proto) to reduce communication, and Noise-free edge-LDP (NFDP) to provide edge-local differential privacy without injecting noise. The authors provide theoretical guarantees for edge-LDP and validate the approach on four real-world datasets, showing improvements in utility and efficiency over strong baselines while maintaining privacy protections. The proposed combination enables scalable, privacy-preserving subgraph learning with enriched local contexts and reduced inter-client communication, appropriate for large, distributed graph environments.

Abstract

Behemoth graphs are often fragmented and separately stored by multiple data owners as distributed subgraphs in many realistic applications. Without harming data privacy, it is natural to consider the subgraph federated learning (subgraph FL) scenario, where each local client holds a subgraph of the entire global graph, to obtain globally generalized graph mining models. To overcome the unique challenge of incomplete information propagation on local subgraphs due to missing cross-subgraph neighbors, previous works resort to the augmentation of local neighborhoods through the joint FL of missing neighbor generators and GNNs. Yet their technical designs have profound limitations regarding the utility, efficiency, and privacy goals of FL. In this work, we propose FedDEP to comprehensively tackle these challenges in subgraph FL. FedDEP consists of a series of novel technical designs: (1) Deep neighbor generation through leveraging the GNN embeddings of potential missing neighbors; (2) Efficient pseudo-FL for neighbor generation through embedding prototyping; and (3) Privacy protection through noise-less edge-local-differential-privacy. We analyze the correctness and efficiency of FedDEP, and provide theoretical guarantees on its privacy. Empirical results on four real-world datasets justify the clear benefits of proposed techniques.

Deep Efficient Private Neighbor Generation for Subgraph Federated Learning

TL;DR

This paper tackles subgraph federated learning where cross-subgraph neighbors are incomplete due to data fragmentation, hindering GNN training. It introduces FedDEP, a framework comprising Deep Neighbor Generation (DGen) for multi-hop missing-neighbor embeddings, Efficient Pseudo-FL with prototypes (Proto) to reduce communication, and Noise-free edge-LDP (NFDP) to provide edge-local differential privacy without injecting noise. The authors provide theoretical guarantees for edge-LDP and validate the approach on four real-world datasets, showing improvements in utility and efficiency over strong baselines while maintaining privacy protections. The proposed combination enables scalable, privacy-preserving subgraph learning with enriched local contexts and reduced inter-client communication, appropriate for large, distributed graph environments.

Abstract

Behemoth graphs are often fragmented and separately stored by multiple data owners as distributed subgraphs in many realistic applications. Without harming data privacy, it is natural to consider the subgraph federated learning (subgraph FL) scenario, where each local client holds a subgraph of the entire global graph, to obtain globally generalized graph mining models. To overcome the unique challenge of incomplete information propagation on local subgraphs due to missing cross-subgraph neighbors, previous works resort to the augmentation of local neighborhoods through the joint FL of missing neighbor generators and GNNs. Yet their technical designs have profound limitations regarding the utility, efficiency, and privacy goals of FL. In this work, we propose FedDEP to comprehensively tackle these challenges in subgraph FL. FedDEP consists of a series of novel technical designs: (1) Deep neighbor generation through leveraging the GNN embeddings of potential missing neighbors; (2) Efficient pseudo-FL for neighbor generation through embedding prototyping; and (3) Privacy protection through noise-less edge-local-differential-privacy. We analyze the correctness and efficiency of FedDEP, and provide theoretical guarantees on its privacy. Empirical results on four real-world datasets justify the clear benefits of proposed techniques.
Paper Structure (14 sections, 1 theorem, 9 equations, 5 figures, 5 tables)

This paper contains 14 sections, 1 theorem, 9 equations, 5 figures, 5 tables.

Key Result

Theorem 3.1

For a distributed subgraph system, on each subgraph, given every node's $L$-hop ego-graph with its every $L$-1 hop neighbors of degrees by at least $D$, FedDEP unifies all subgraphs in the system to federally train a joint model of a classifier and a cross-subgraph deep neighbor generator. By learni and $U$ = $\min \{\sqrt{\ln (e+\frac{\varepsilon\sqrt{LN }}{\delta'})}, \sqrt{\ln (\frac{1}{\delta'

Figures (5)

  • Figure 1: A toy example of modeling the spread of infectious disease in a distributed subgraph FL system. The black lines are the close contact relations between people, and the dashed red lines are the cross-subgraph missing links. Red solid lines are the generated links, and the people figures with red solid rectangles are the generated neighbors. (a) The reason for a target to be diagnosed when his/her direct contacts are all healthy can be attributed to Pattern ①: some healthy neighbors directly contact with many diagnosed ones, or Pattern ②: many healthy neighbors directly contacts with diagnosed ones. (b) If the global graph is available, both patterns are observable and centralized GNN can correctly identify the reasons for both $u$ and $v$ to be infected. (c) In the more realistic setting of local subgraphs, neither of the patterns is observable and GNN obtained from generic FL (such as FedAvg) will fail to learn why $u$ and $v$ are infected. (d) FedSage tries to recover 1-hop missing neighbors across local subgraphs through three steps, which require significant extra communication and computation. (e) Unfortunately, even if all 1-hop missing neighbors can be generated accurately, GNN obtained through FedSage still fail because the correct patterns require access to deeper missing neighbors.
  • Figure 2: Technical motivation of FedDEP against FedSage. FedDEP generates information of multiple hops of neighbors to provide the subgraph with richer information for local nodes, compared to the direct missing neighbors generated by FedSage. For more details of FedSage, please refer to the background discussion in Appendix D and the FedSage paper zhang2021subgraph.
  • Figure 3: Overview of the proposed FedDEP (with the novel DGen, Proto, and NFDP components highlighted).
  • Figure 4: Component study for DGen in FedDEP with different depths $L$ of generated neighbor embeddings on four datasets with different $M$'s. $L$=0 is FedSage.
  • Figure 5: Training curves of different frameworks on Cora dataset with M=5. (Best viewed in color.)

Theorems & Definitions (1)

  • Theorem 3.1: Noise-free edge-LDP of FedDEP