Forget to Generalize: Iterative Adaptation for Generalization in Federated Learning

Abdulrahman Alotaibi; Irene Tenison; Miriam Kim; Isaac Lee; Lalana Kagal

Forget to Generalize: Iterative Adaptation for Generalization in Federated Learning

Abdulrahman Alotaibi, Irene Tenison, Miriam Kim, Isaac Lee, Lalana Kagal

TL;DR

This work tackles poor generalization in non-IID federated learning by introducing Iterative Federated Adaptation (IFA), which partitions training into $G$ generations each with $C$ rounds and $E$ local epochs, and periodically resets a fraction $\rho$ of parameters to forget client-specific biases. Two reset strategies are proposed—Random Parameter Selection and Later Layer Selection—providing a mechanism to re-learn generalizable representations while preserving useful prior knowledge. Across CIFAR-10, MIT Indoors, and Stanford Dogs, IFA yields substantial improvements, averaging $21.5\%$ gains and proving robust across IID/Non-IID settings and varying client counts, while remaining agnostic to the underlying aggregation method. The approach, inspired by continual learning, offers a practical, plug-and-play enhancement for privacy-preserving, scalable web-scale FL systems. Future work includes developing adaptive reset schedules and providing theoretical analyses of generalization-accuracy trade-offs in large-scale federated deployments.

Abstract

The Web is naturally heterogeneous with user devices, geographic regions, browsing patterns, and contexts all leading to highly diverse, unique datasets. Federated Learning (FL) is an important paradigm for the Web because it enables privacy-preserving, collaborative machine learning across diverse user devices, web services and clients without needing to centralize sensitive data. However, its performance degrades severely under non-IID client distributions that is prevalent in real-world web systems. In this work, we propose a new training paradigm - Iterative Federated Adaptation (IFA) - that enhances generalization in heterogeneous federated settings through generation-wise forget and evolve strategy. Specifically, we divide training into multiple generations and, at the end of each, select a fraction of model parameters (a) randomly or (b) from the later layers of the model and reinitialize them. This iterative forget and evolve schedule allows the model to escape local minima and preserve globally relevant representations. Extensive experiments on CIFAR-10, MIT-Indoors, and Stanford Dogs datasets show that the proposed approach improves global accuracy, especially when the data cross clients are Non-IID. This method can be implemented on top any federated algorithm to improve its generalization performance. We observe an average of 21.5%improvement across datasets. This work advances the vision of scalable, privacy-preserving intelligence for real-world heterogeneous and distributed web systems.

Forget to Generalize: Iterative Adaptation for Generalization in Federated Learning

TL;DR

This work tackles poor generalization in non-IID federated learning by introducing Iterative Federated Adaptation (IFA), which partitions training into

generations each with

rounds and

local epochs, and periodically resets a fraction

of parameters to forget client-specific biases. Two reset strategies are proposed—Random Parameter Selection and Later Layer Selection—providing a mechanism to re-learn generalizable representations while preserving useful prior knowledge. Across CIFAR-10, MIT Indoors, and Stanford Dogs, IFA yields substantial improvements, averaging

gains and proving robust across IID/Non-IID settings and varying client counts, while remaining agnostic to the underlying aggregation method. The approach, inspired by continual learning, offers a practical, plug-and-play enhancement for privacy-preserving, scalable web-scale FL systems. Future work includes developing adaptive reset schedules and providing theoretical analyses of generalization-accuracy trade-offs in large-scale federated deployments.

Abstract

Paper Structure (17 sections, 2 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 17 sections, 2 equations, 1 figure, 3 tables, 1 algorithm.

Introduction
Related Works
Federated Iterative Adaptation
Method Overview
Algorithmic Framework
Parameter Selection Strategies
Strategy 1: Random Parameter Selection
Strategy 2: Later Layer Selection
Why Periodic Reset Mitigates Non-IID Drift
Representational Drift in Federated Non-IID Settings
The Forget-and-Evolve Mechanism as a Regularizer
Agnosticity to Aggregation Methods
Results & Discussion
Evaluation Setup
Performance Comparison
...and 2 more sections

Figures (1)

Figure 1: (Left) Global model generalization performance and (Right) loss on the local model at client 0 of IFA (Ours) and Vanilla FL (FedAVG).

Forget to Generalize: Iterative Adaptation for Generalization in Federated Learning

TL;DR

Abstract

Forget to Generalize: Iterative Adaptation for Generalization in Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (1)