Table of Contents
Fetching ...

NVIDIA FLARE: Federated Learning from Simulation to Real-World

Holger R. Roth, Yan Cheng, Yuhong Wen, Isaac Yang, Ziyue Xu, Yuan-Ting Hsieh, Kristopher Kersten, Ahmed Harouni, Can Zhao, Kevin Lu, Zhihong Zhang, Wenqi Li, Andriy Myronenko, Dong Yang, Sean Yang, Nicola Rieke, Abood Quraini, Chester Chen, Daguang Xu, Nic Ma, Prerna Dogra, Mona Flores, Andrew Feng

TL;DR

The paper presents NVFlare, an open-source federated learning SDK designed to bridge research simulation and real-world deployment while preserving privacy. It introduces a spec-based, controller–worker architecture with a rich set of components (simulator, dashboard, reference algorithms, privacy filters) and supports various FL paradigms, including FedAvg, FedProx, SCAFFOLD, split learning, and federated statistics. Through real-world healthcare use cases and comparative analyses, the authors demonstrate NVFlare's versatility, scalability, and security features (mutual TLS, RBAC, auditing). They argue for NVFlare as a flexible platform capable of integrating with existing ML ecosystems and enabling future large-language-model federated fine-tuning, with ongoing work on privacy-preserving and production-ready improvements.

Abstract

Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package. It allows researchers to apply their data science workflows in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) in real-world FL settings. This paper introduces the key design principles of NVFlare and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms. Code is available at https://github.com/NVIDIA/NVFlare.

NVIDIA FLARE: Federated Learning from Simulation to Real-World

TL;DR

The paper presents NVFlare, an open-source federated learning SDK designed to bridge research simulation and real-world deployment while preserving privacy. It introduces a spec-based, controller–worker architecture with a rich set of components (simulator, dashboard, reference algorithms, privacy filters) and supports various FL paradigms, including FedAvg, FedProx, SCAFFOLD, split learning, and federated statistics. Through real-world healthcare use cases and comparative analyses, the authors demonstrate NVFlare's versatility, scalability, and security features (mutual TLS, RBAC, auditing). They argue for NVFlare as a flexible platform capable of integrating with existing ML ecosystems and enabling future large-language-model federated fine-tuning, with ongoing work on privacy-preserving and production-ready improvements.

Abstract

Federated learning (FL) enables building robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package. It allows researchers to apply their data science workflows in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) in real-world FL settings. This paper introduces the key design principles of NVFlare and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms. Code is available at https://github.com/NVIDIA/NVFlare.
Paper Structure (29 sections, 8 figures, 1 table)

This paper contains 29 sections, 8 figures, 1 table.

Figures (8)

  • Figure 1: NVFlare job execution. The Controller is a Python object that controls or coordinates the Workers to get a job done. The controller is run on the FL server. A Worker is capable of performing tasks. Workers run on FL clients.
  • Figure 2: Comparison of GRPC and TCP communication drivers in NVFlare. The server is running on Azure. The clients are distributed between Azure and AWS. The message size is $\sim$18MB. Communication times were measured over 100 rounds of FedAvg. Error bars indicate the 95% confidence intervals.
  • Figure 3: High-level steps for running a real-world study with secure provisioning with NVFlare.
  • Figure 4: Federated learning experiments with NVFlare.
  • Figure 5: Tree-based federated XGBoost: a "boosting of forests."
  • ...and 3 more figures