Table of Contents
Fetching ...

Flashback: Understanding and Mitigating Forgetting in Federated Learning

Mohammed Aljahdali, Ahmed M. Abdelmoniem, Marco Canini, Samuel Horváth

TL;DR

Flashback is proposed, a novel FL algorithm with a dynamic distillation approach that regularizes the local models and effectively aggregates their knowledge, achieving faster round-to-target accuracy by converging in 6 to 16 rounds, being up to $27 \times faster.

Abstract

In Federated Learning (FL), forgetting, or the loss of knowledge across rounds, hampers algorithm convergence, particularly in the presence of severe data heterogeneity among clients. This study explores the nuances of this issue, emphasizing the critical role of forgetting in FL's inefficient learning within heterogeneous data contexts. Knowledge loss occurs in both client-local updates and server-side aggregation steps; addressing one without the other fails to mitigate forgetting. We introduce a metric to measure forgetting granularly, ensuring distinct recognition amid new knowledge acquisition. Leveraging these insights, we propose Flashback, an FL algorithm with a dynamic distillation approach that is used to regularize the local models, and effectively aggregate their knowledge. Across different benchmarks, Flashback outperforms other methods, mitigates forgetting, and achieves faster round-to-target-accuracy, by converging in 6 to 16 rounds.

Flashback: Understanding and Mitigating Forgetting in Federated Learning

TL;DR

Flashback is proposed, a novel FL algorithm with a dynamic distillation approach that regularizes the local models and effectively aggregates their knowledge, achieving faster round-to-target accuracy by converging in 6 to 16 rounds, being up to $27 \times faster.

Abstract

In Federated Learning (FL), forgetting, or the loss of knowledge across rounds, hampers algorithm convergence, particularly in the presence of severe data heterogeneity among clients. This study explores the nuances of this issue, emphasizing the critical role of forgetting in FL's inefficient learning within heterogeneous data contexts. Knowledge loss occurs in both client-local updates and server-side aggregation steps; addressing one without the other fails to mitigate forgetting. We introduce a metric to measure forgetting granularly, ensuring distinct recognition amid new knowledge acquisition. Leveraging these insights, we propose Flashback, an FL algorithm with a dynamic distillation approach that is used to regularize the local models, and effectively aggregate their knowledge. Across different benchmarks, Flashback outperforms other methods, mitigates forgetting, and achieves faster round-to-target-accuracy, by converging in 6 to 16 rounds.
Paper Structure (15 sections, 6 equations, 15 figures, 1 table, 1 algorithm)

This paper contains 15 sections, 6 equations, 15 figures, 1 table, 1 algorithm.

Figures (15)

  • Figure 1: Performance of and other baselines over training rounds with CIFAR10.
  • Figure 2: Local (client) & Global Forgetting in some of the baselines using CIFAR10. The first row represents the global model per-class test accuracy at round $t-1$; then, the rows in the middle are all the clients that participated in round $t$, and finally, in the last row, the global model at the end of round $t$. Local forgetting happens when clients at round $t$ lose the knowledge that the global model had at round $t-1$. The global forgetting happens when the global model at round $t$ loses the knowledge that in the clients' models at round $t$.
  • Figure 3: Round-to-accuracy performance of Flashback and other baselines over training rounds.
  • Figure 4: Transition of local models loss to the global model loss over the rounds.
  • Figure 5: (left) Per-class accuracy of a client model on all the rounds where it participated. (right) Data distribution of that client.
  • ...and 10 more figures