Table of Contents
Fetching ...

Federated Graph Condensation with Information Bottleneck Principles

Bo Yan, Sihao He, Cheng Yang, Shang Liu, Yang Cao, Chuan Shi

TL;DR

Federated Graph Condensation tackles learning a compact condensed graph from distributed subgraphs while protecting privacy. It introduces FedGC, a gradient-matching framework that aggregates class-aware client gradients to update a small graph G′ and uses a one-step encrypted step to recover cross-client neighbors, together with a local information bottleneck IB based transformation to defend membership inference attacks. Theoretical results bound the membership privacy leakage in terms of mutual information I(G_i; G′) and I(G_i; G_i^t) and show a privacy-utility trade-off controlled by $\gamma$. Empirically, on five real-world datasets FedGC achieves competitive or superior utility relative to centralized GC and state-of-the-art FGL baselines, while consistently reducing membership privacy leakage, with stronger gains in non-iid settings. The work thus provides a practical, privacy-preserving approach to scalable graph condensation in distributed environments and supports downstream personalization via local fine-tuning.

Abstract

Graph condensation (GC), which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has benefited various graph learning tasks. However, existing GC methods rely on centralized data storage, which is unfeasible for real-world decentralized data distribution, and overlook data holders' privacy-preserving requirements. To bridge this gap, we propose and study the novel problem of federated graph condensation (FGC) for graph neural networks (GNNs). Specifically, we first propose a general framework for FGC, where we decouple the typical gradient matching process for GC into client-side gradient calculation and server-side gradient matching, integrating knowledge from multiple clients' subgraphs into one smaller condensed graph. Nevertheless, our empirical studies show that under the federated setting, the condensed graph will consistently leak data membership privacy, i.e., the condensed graph during federated training can be utilized to steal training data under the membership inference attack (MIA). To tackle this issue, we innovatively incorporate information bottleneck principles into the FGC, which only needs to extract partial node features in one local pre-training step and utilize the features during federated training. Theoretical and experimental analyses demonstrate that our framework consistently protects membership privacy during training. Meanwhile, it can achieve comparable and even superior performance against existing centralized GC and federated graph learning (FGL) methods.

Federated Graph Condensation with Information Bottleneck Principles

TL;DR

Federated Graph Condensation tackles learning a compact condensed graph from distributed subgraphs while protecting privacy. It introduces FedGC, a gradient-matching framework that aggregates class-aware client gradients to update a small graph G′ and uses a one-step encrypted step to recover cross-client neighbors, together with a local information bottleneck IB based transformation to defend membership inference attacks. Theoretical results bound the membership privacy leakage in terms of mutual information I(G_i; G′) and I(G_i; G_i^t) and show a privacy-utility trade-off controlled by . Empirically, on five real-world datasets FedGC achieves competitive or superior utility relative to centralized GC and state-of-the-art FGL baselines, while consistently reducing membership privacy leakage, with stronger gains in non-iid settings. The work thus provides a practical, privacy-preserving approach to scalable graph condensation in distributed environments and supports downstream personalization via local fine-tuning.

Abstract

Graph condensation (GC), which reduces the size of a large-scale graph by synthesizing a small-scale condensed graph as its substitution, has benefited various graph learning tasks. However, existing GC methods rely on centralized data storage, which is unfeasible for real-world decentralized data distribution, and overlook data holders' privacy-preserving requirements. To bridge this gap, we propose and study the novel problem of federated graph condensation (FGC) for graph neural networks (GNNs). Specifically, we first propose a general framework for FGC, where we decouple the typical gradient matching process for GC into client-side gradient calculation and server-side gradient matching, integrating knowledge from multiple clients' subgraphs into one smaller condensed graph. Nevertheless, our empirical studies show that under the federated setting, the condensed graph will consistently leak data membership privacy, i.e., the condensed graph during federated training can be utilized to steal training data under the membership inference attack (MIA). To tackle this issue, we innovatively incorporate information bottleneck principles into the FGC, which only needs to extract partial node features in one local pre-training step and utilize the features during federated training. Theoretical and experimental analyses demonstrate that our framework consistently protects membership privacy during training. Meanwhile, it can achieve comparable and even superior performance against existing centralized GC and federated graph learning (FGL) methods.
Paper Structure (28 sections, 16 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 28 sections, 16 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: The comparison between (a) centralized graph condensation(GC), (b) federated graph learning (FGL) and federated graph condensation (FGC).
  • Figure 2: The comparison between MIA on condensed graphs and traditional MIA on graphs. The components circled by blue dashed lines are invisible to attackers.
  • Figure 3: The comparison between the utility of condensed graph (accuracy of node classification $\uparrow$) and privacy attack performance (AUC of MIA $\downarrow$) during federated training (left: Cora, right: Citeseer).
  • Figure 4: The overall workflow of local graph transformation with information bottleneck principles.
  • Figure 5: Ablation study for the performance of condensed graph (left $\uparrow$) and MIA (right $\downarrow$).
  • ...and 5 more figures

Theorems & Definitions (2)

  • proof
  • proof