Table of Contents
Fetching ...

DistDD: Distributed Data Distillation Aggregation through Gradient Matching

Peiran Wang, Haohan Wang

TL;DR

This paper provides a detailed convergence proof of the DistDD algorithm, reinforcing its mathematical stability and reliability for practical applications, and demonstrates the effectiveness and robustness of DistDD, particularly in non-i.i.d. and mislabeled data scenarios.

Abstract

In this paper, we introduce DistDD, a novel approach within the federated learning framework that reduces the need for repetitive communication by distilling data directly on clients' devices. Unlike traditional federated learning that requires iterative model updates across nodes, DistDD facilitates a one-time distillation process that extracts a global distilled dataset, maintaining the privacy standards of federated learning while significantly cutting down communication costs. By leveraging the DistDD's distilled dataset, the developers of the FL can achieve just-in-time parameter tuning and neural architecture search over FL without repeating the whole FL process multiple times. We provide a detailed convergence proof of the DistDD algorithm, reinforcing its mathematical stability and reliability for practical applications. Our experiments demonstrate the effectiveness and robustness of DistDD, particularly in non-i.i.d. and mislabeled data scenarios, showcasing its potential to handle complex real-world data challenges distinctively from conventional federated learning methods. We also evaluate DistDD's application in the use case and prove its effectiveness and communication-savings in the NAS use case.

DistDD: Distributed Data Distillation Aggregation through Gradient Matching

TL;DR

This paper provides a detailed convergence proof of the DistDD algorithm, reinforcing its mathematical stability and reliability for practical applications, and demonstrates the effectiveness and robustness of DistDD, particularly in non-i.i.d. and mislabeled data scenarios.

Abstract

In this paper, we introduce DistDD, a novel approach within the federated learning framework that reduces the need for repetitive communication by distilling data directly on clients' devices. Unlike traditional federated learning that requires iterative model updates across nodes, DistDD facilitates a one-time distillation process that extracts a global distilled dataset, maintaining the privacy standards of federated learning while significantly cutting down communication costs. By leveraging the DistDD's distilled dataset, the developers of the FL can achieve just-in-time parameter tuning and neural architecture search over FL without repeating the whole FL process multiple times. We provide a detailed convergence proof of the DistDD algorithm, reinforcing its mathematical stability and reliability for practical applications. Our experiments demonstrate the effectiveness and robustness of DistDD, particularly in non-i.i.d. and mislabeled data scenarios, showcasing its potential to handle complex real-world data challenges distinctively from conventional federated learning methods. We also evaluate DistDD's application in the use case and prove its effectiveness and communication-savings in the NAS use case.

Paper Structure

This paper contains 26 sections, 41 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: We provided two use cases for DistDD. The parameter tuning and NAS require multiple whole FL processes for typical FL. For DistDD, the server needs to acquire the distilled dataset through DistDD process at first, then repeat local tuning and NAS within the FL server itself to get the optimal network architecture and optimal parameter. Then, the server can repeat the FL process only once using the optimal network architecture and optimal parameters.
  • Figure 2: We conducted further study of mislabeling in data. Considering developers may mistakenly or maliciously mislabel the input dataset, we evaluated DistDD's performance under this situation. The mislabeling data portion is set from 0.0 to 1.0.
  • Figure 3: Considering the non-iid nature of federated learning, we studied how the non-iid distribution affects the performance of DistDD. We use the Dirichlet distribution to model the non-iid distribution.
  • Figure 4: Use case evaluation for DistDD. The results indicate that using DistDD for Network Architecture Search (NAS) over Federated Learning (FL) is as effective as the traditional FedAvg approach in terms of accuracy. However, DistDD offers a significant advantage in reducing time costs, especially as the number of tuning iterations increases. This is because, unlike FedAvg, DistDD requires less communication after the initial tuning, presenting a more efficient trade-off between time and performance.
  • Figure 5: Time overhead comparison between FedAvg and DistDD under different hyper-parameter tuning times.
  • ...and 5 more figures