Table of Contents
Fetching ...

Data-Free Black-Box Federated Learning via Zeroth-Order Gradient Estimation

Xinge Ma, Jin Wang, Xuejie Zhang

TL;DR

Data privacy and non-IID data distributions hinder scalable federated learning across heterogeneous clients. The paper proposes FedZGE, a data-free black-box FL framework that trains a server-side generator to synthesize task-specific data and uses zeroth-order gradient estimation to update the generator without exposing on-device models or requiring auxiliary data. FedZGE enhances transferability, diversity, and equilibrium of generated samples through four losses (fidelity, adversarial, diversity, information entropy) and augments robustness to data/model heterogeneity via local distillation with ensemble supervision. Empirical results on CIFAR-10/100 show FedZGE achieves strong accuracy with competitive or lower communication overhead and improved privacy relative to existing parameter-based, distillation-based, and data-free baselines. The approach offers a practical pathway to privacy-preserving, scalable KD-based FL in settings where auxiliary data or white-box access are unavailable.

Abstract

Federated learning (FL) enables decentralized clients to collaboratively train a global model under the orchestration of a central server without exposing their individual data. However, the iterative exchange of model parameters between the server and clients imposes heavy communication burdens, risks potential privacy leakage, and even precludes collaboration among heterogeneous clients. Distillation-based FL tackles these challenges by exchanging low-dimensional model outputs rather than model parameters, yet it highly relies on a task-relevant auxiliary dataset that is often not available in practice. Data-free FL attempts to overcome this limitation by training a server-side generator to directly synthesize task-specific data samples for knowledge transfer. However, the update rule of the generator requires clients to share on-device models for white-box access, which greatly compromises the advantages of distillation-based FL. This motivates us to explore a data-free and black-box FL framework via Zeroth-order Gradient Estimation (FedZGE), which estimates the gradients after flowing through on-device models in a black-box optimization manner to complete the training of the generator in terms of fidelity, transferability, diversity, and equilibrium, without involving any auxiliary data or sharing any model parameters, thus combining the advantages of both distillation-based FL and data-free FL. Experiments on large-scale image classification datasets and network architectures demonstrate the superiority of FedZGE in terms of data heterogeneity, model heterogeneity, communication efficiency, and privacy protection.

Data-Free Black-Box Federated Learning via Zeroth-Order Gradient Estimation

TL;DR

Data privacy and non-IID data distributions hinder scalable federated learning across heterogeneous clients. The paper proposes FedZGE, a data-free black-box FL framework that trains a server-side generator to synthesize task-specific data and uses zeroth-order gradient estimation to update the generator without exposing on-device models or requiring auxiliary data. FedZGE enhances transferability, diversity, and equilibrium of generated samples through four losses (fidelity, adversarial, diversity, information entropy) and augments robustness to data/model heterogeneity via local distillation with ensemble supervision. Empirical results on CIFAR-10/100 show FedZGE achieves strong accuracy with competitive or lower communication overhead and improved privacy relative to existing parameter-based, distillation-based, and data-free baselines. The approach offers a practical pathway to privacy-preserving, scalable KD-based FL in settings where auxiliary data or white-box access are unavailable.

Abstract

Federated learning (FL) enables decentralized clients to collaboratively train a global model under the orchestration of a central server without exposing their individual data. However, the iterative exchange of model parameters between the server and clients imposes heavy communication burdens, risks potential privacy leakage, and even precludes collaboration among heterogeneous clients. Distillation-based FL tackles these challenges by exchanging low-dimensional model outputs rather than model parameters, yet it highly relies on a task-relevant auxiliary dataset that is often not available in practice. Data-free FL attempts to overcome this limitation by training a server-side generator to directly synthesize task-specific data samples for knowledge transfer. However, the update rule of the generator requires clients to share on-device models for white-box access, which greatly compromises the advantages of distillation-based FL. This motivates us to explore a data-free and black-box FL framework via Zeroth-order Gradient Estimation (FedZGE), which estimates the gradients after flowing through on-device models in a black-box optimization manner to complete the training of the generator in terms of fidelity, transferability, diversity, and equilibrium, without involving any auxiliary data or sharing any model parameters, thus combining the advantages of both distillation-based FL and data-free FL. Experiments on large-scale image classification datasets and network architectures demonstrate the superiority of FedZGE in terms of data heterogeneity, model heterogeneity, communication efficiency, and privacy protection.

Paper Structure

This paper contains 44 sections, 20 equations, 6 figures, 10 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the proposed FedZGE framework.
  • Figure 2: Ablation study of FedZGE and FedZGE with true gradients on the CIFAR-10 and CIFAR-100 datasets.
  • Figure 3: Visualization of data heterogeneity among clients on the CIFAR-10 dataset, where the abscissa denotes the client id, the ordinate denotes the class label, and the size of each scattered point denotes the number of training samples available for each client under each label.
  • Figure 4: Visualization of data heterogeneity among clients on the CIFAR-100 dataset, where the abscissa denotes the client id, the ordinate denotes the class label, and the size of each scattered point denotes the number of training samples available for each client under each label.
  • Figure 5: Visualization of original and synthetic data on the CIFAR-10 and CIFAR-100 datasets.
  • ...and 1 more figures