Table of Contents
Fetching ...

OpenFGL: A Comprehensive Benchmark for Federated Graph Learning

Xunkai Li, Yinlin Zhu, Boyang Pang, Guochen Yan, Yeyu Yan, Zening Li, Zhengyu Wu, Wentao Zhang, Rong-Hua Li, Guoren Wang

TL;DR

OpenFGL addresses the need for fair evaluation in Federated Graph Learning by introducing a comprehensive benchmark that unites Graph-FL and Subgraph-FL under a single framework. It integrates 42 datasets across 18 domains, eight federated data-simulation strategies, five graph-based downstream tasks, and 18 SOTA FGL algorithms via a unified API, enabling rigorous comparisons of effectiveness, robustness, and efficiency. The work provides extensive experimental insights, including performance trends across data heterogeneity, resilience to noise, privacy trade-offs, and complexity analysis, while offering guidance and tutorials to facilitate adoption and future research. By open-sourcing the library and standardizing evaluation, OpenFGL aims to accelerate interdisciplinary collaboration between FGL and data systems, driving scalable, privacy-preserving graph learning in real-world deployments.

Abstract

Federated graph learning (FGL) is a promising distributed training paradigm for graph neural networks across multiple local systems without direct data sharing. This approach inherently involves large-scale distributed graph processing, which closely aligns with the challenges and research focuses of graph-based data systems. Despite the proliferation of FGL, the diverse motivations from real-world applications, spanning various research backgrounds and settings, pose a significant challenge to fair evaluation. To fill this gap, we propose OpenFGL, a unified benchmark designed for the primary FGL scenarios: Graph-FL and Subgraph-FL. Specifically, OpenFGL includes 42 graph datasets from 18 application domains, 8 federated data simulation strategies that emphasize different graph properties, and 5 graph-based downstream tasks. Additionally, it offers 18 recently proposed SOTA FGL algorithms through a user-friendly API, enabling a thorough comparison and comprehensive evaluation of their effectiveness, robustness, and efficiency. Our empirical results demonstrate the capabilities of FGL while also highlighting its potential limitations, providing valuable insights for future research in this growing field, particularly in fostering greater interdisciplinary collaboration between FGL and data systems.

OpenFGL: A Comprehensive Benchmark for Federated Graph Learning

TL;DR

OpenFGL addresses the need for fair evaluation in Federated Graph Learning by introducing a comprehensive benchmark that unites Graph-FL and Subgraph-FL under a single framework. It integrates 42 datasets across 18 domains, eight federated data-simulation strategies, five graph-based downstream tasks, and 18 SOTA FGL algorithms via a unified API, enabling rigorous comparisons of effectiveness, robustness, and efficiency. The work provides extensive experimental insights, including performance trends across data heterogeneity, resilience to noise, privacy trade-offs, and complexity analysis, while offering guidance and tutorials to facilitate adoption and future research. By open-sourcing the library and standardizing evaluation, OpenFGL aims to accelerate interdisciplinary collaboration between FGL and data systems, driving scalable, privacy-preserving graph learning in real-world deployments.

Abstract

Federated graph learning (FGL) is a promising distributed training paradigm for graph neural networks across multiple local systems without direct data sharing. This approach inherently involves large-scale distributed graph processing, which closely aligns with the challenges and research focuses of graph-based data systems. Despite the proliferation of FGL, the diverse motivations from real-world applications, spanning various research backgrounds and settings, pose a significant challenge to fair evaluation. To fill this gap, we propose OpenFGL, a unified benchmark designed for the primary FGL scenarios: Graph-FL and Subgraph-FL. Specifically, OpenFGL includes 42 graph datasets from 18 application domains, 8 federated data simulation strategies that emphasize different graph properties, and 5 graph-based downstream tasks. Additionally, it offers 18 recently proposed SOTA FGL algorithms through a user-friendly API, enabling a thorough comparison and comprehensive evaluation of their effectiveness, robustness, and efficiency. Our empirical results demonstrate the capabilities of FGL while also highlighting its potential limitations, providing valuable insights for future research in this growing field, particularly in fostering greater interdisciplinary collaboration between FGL and data systems.
Paper Structure (30 sections, 1 theorem, 5 equations, 2 figures, 16 tables, 3 algorithms)

This paper contains 30 sections, 1 theorem, 5 equations, 2 figures, 16 tables, 3 algorithms.

Key Result

Lemma A.4

For any node $v_{i}$, let $\mathbf{y}_{i}$ represent the ground truth and $\mathbf{\tilde{y}}_{i}$ the prediction from a $1$-layer GNN. Consider the loss function $\mathcal{L}$ of the form: $\mathcal{L}(G,\mathbf{\Theta})=\sum_{v_{i}\in \mathcal{V}}\ell(\tilde{\mathbf{y}}_{i};\mathbf{y}_{i})$. The $ where $d_{max}$ denotes the maximum degree of the graph $G$.

Figures (2)

  • Figure 1: Robustness performance on Graph-FL ENZYMES (upper) and Subgraph-FL Cora (lower).
  • Figure 2: Practical efficiency in terms of performance, communication costs, and running time.

Theorems & Definitions (4)

  • Definition A.1: Differential Privacy dwork2006calibrating
  • Definition A.2: Rényi Differential Privacy mironov2017renyi
  • Definition A.3: $\ell_{2}$-Sensitivity
  • Lemma A.4: Node-Level Sensitivity of the $1$-Layer GNN daigavane2021node