Table of Contents
Fetching ...

DART: A Server-side Plug-in for Resource-efficient Robust Federated Learning

Omar Bekdache, Naresh Shanbhag

Abstract

Federated learning (FL) emerged as a popular distributed algorithm to train machine learning models on edge devices while preserving data privacy. However, FL systems face challenges due to client-side computational constraints and from a lack of robustness to naturally occurring common corruptions such as noise, blur, and weather effects. Existing robust training methods are computationally expensive and unsuitable for resource-constrained clients. We propose a novel data-agnostic robust training (DART) plug-in that can be deployed in any FL system to enhance robustness at zero client overhead. DART operates at the server-side and does not require private data access, ensuring seamless integration in existing FL systems. Extensive experiments showcase DART's ability to enhance robustness of state-of-the-art FL systems, establishing it as a practical and scalable solution for real-world robust FL deployment.

DART: A Server-side Plug-in for Resource-efficient Robust Federated Learning

Abstract

Federated learning (FL) emerged as a popular distributed algorithm to train machine learning models on edge devices while preserving data privacy. However, FL systems face challenges due to client-side computational constraints and from a lack of robustness to naturally occurring common corruptions such as noise, blur, and weather effects. Existing robust training methods are computationally expensive and unsuitable for resource-constrained clients. We propose a novel data-agnostic robust training (DART) plug-in that can be deployed in any FL system to enhance robustness at zero client overhead. DART operates at the server-side and does not require private data access, ensuring seamless integration in existing FL systems. Extensive experiments showcase DART's ability to enhance robustness of state-of-the-art FL systems, establishing it as a practical and scalable solution for real-world robust FL deployment.

Paper Structure

This paper contains 48 sections, 3 theorems, 26 equations, 10 figures, 11 tables, 2 algorithms.

Key Result

Theorem 1

For any $\gamma \in (0, 1]$, the clean risk of the student model $\mathcal{R}_{\mathcal{D}_\text{in}}(f_{\mathbf{w}_\text{s}})$ satisfies the following upper bound:

Figures (10)

  • Figure 1: Clients in conventional Clean FL (a) employ local data to train while the server aggregates the model to generate a fragile model. Robust FL (b) employs computationally intensive data augmentation and robust training on the clients while the server aggregates as in Clean FL. Our proposed DART-enhanced FL method (c) employs low-cost clean training on the clients and the data-agnostic robustness training (DART) method on the server to train a robust model, exploiting the asymmetry in the computational resources (time and energy) between the clients and the server.
  • Figure 2: The proposed Data-Agnostic Robustness Training (DART) server-side plug-in. DART initializes both the teacher and student model parameters with model $f_{\mathbf{w}_0}$ pre-trained on clean images and then minimizes $\mathcal{L}_\text{DART}$\ref{['eq:DARTloss']} by updating the student model. The resulting model $f_{\mathbf{w}_\text{rob}}$ achieves enhanced robustness with similar clean accuracy compared to $f_{\mathbf{w}_0}$ with zero robust training overhead on the clients.
  • Figure 3: ResNet-18 CIFAR-10-C robust accuracy as a function of training time and energy. Across all data heterogeneity $\alpha_\text{IID}$ levels, DART-enhanced methods consistently lie near the Pareto frontier (dashed curve), demonstrating a favorable robustness-efficiency tradeoff.
  • Figure 4: CIFAR-10 clean and CIFAR-10-C robust accuracy, $\mathcal{A}_\text{cln}$ and $\mathcal{A}_\text{rob}$, under FedAvg, FedAugMix, and FedAvg+DART as a function of time and energy. The model used is ResNet-18, the server dataset is CIFAR-100, and 10 clients are deployed. FedAvg+DART takes the least energy and time to reach $\mathcal{A}_\text{rob}=80\%$.
  • Figure 5: CIFAR-10 clean and CIFAR-10-C robust accuracy, $\mathcal{A}_\text{cln}$ and $\mathcal{A}_\text{rob}$, under FedAvg, FedAugMix, and FedAvg+DART as a function of time and energy. The model used is MobileNet, the server dataset is CIFAR-100, and 10 clients are deployed.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Theorem 1: Clean Student Risk Bound under Distillation
  • Lemma 1
  • proof
  • Theorem : Clean Student Risk Bound under Distillation
  • proof