An Aggregation-Free Federated Learning for Tackling Data Heterogeneity

Yuan Wang; Huazhu Fu; Renuga Kanagavelu; Qingsong Wei; Yong Liu; Rick Siow Mong Goh

An Aggregation-Free Federated Learning for Tackling Data Heterogeneity

Yuan Wang, Huazhu Fu, Renuga Kanagavelu, Qingsong Wei, Yong Liu, Rick Siow Mong Goh

TL;DR

Federated Learning with non-IID client data suffers from client drift and slow convergence under the traditional aggregation-then-adapt framework. The paper introduces FedAF, an aggregation-free FL algorithm where clients learn condensed data and the server trains the global model using these condensed data plus soft labels, with distribution-alignment via Distribution Matching and cross-client Soft-labeling through Local-Global Knowledge Matching. Key technical ingredients include $L_{DM}$ with per-class feature means $\mu_{k,c}^{real}$ and $\mu_{k,c}^{syn}$, a SWD-based CDC term, and $L_{LGKM}$ based on KL divergence between local and global logits, plus a model re-sampling step $w \leftarrow \gamma w+(1-\gamma)\tilde{w}$. Experiments on FMNIST, CIFAR-10/100, and DomainNet show FedAF achieves up to 25.44% accuracy gains and up to 80% faster convergence compared to state-of-the-art baselines, especially under strong heterogeneity. The work demonstrates a practical, privacy-aware route to robust, fast-converging FL in heterogeneous environments.

Abstract

The performance of Federated Learning (FL) hinges on the effectiveness of utilizing knowledge from distributed datasets. Traditional FL methods adopt an aggregate-then-adapt framework, where clients update local models based on a global model aggregated by the server from the previous training round. This process can cause client drift, especially with significant cross-client data heterogeneity, impacting model performance and convergence of the FL algorithm. To address these challenges, we introduce FedAF, a novel aggregation-free FL algorithm. In this framework, clients collaboratively learn condensed data by leveraging peer knowledge, the server subsequently trains the global model using the condensed data and soft labels received from the clients. FedAF inherently avoids the issue of client drift, enhances the quality of condensed data amid notable data heterogeneity, and improves the global model performance. Extensive numerical studies on several popular benchmark datasets show FedAF surpasses various state-of-the-art FL algorithms in handling label-skew and feature-skew data heterogeneity, leading to superior global model accuracy and faster convergence.

An Aggregation-Free Federated Learning for Tackling Data Heterogeneity

TL;DR

with per-class feature means

and

, a SWD-based CDC term, and

based on KL divergence between local and global logits, plus a model re-sampling step

. Experiments on FMNIST, CIFAR-10/100, and DomainNet show FedAF achieves up to 25.44% accuracy gains and up to 80% faster convergence compared to state-of-the-art baselines, especially under strong heterogeneity. The work demonstrates a practical, privacy-aware route to robust, fast-converging FL in heterogeneous environments.

Abstract

Paper Structure (12 sections, 17 equations, 9 figures, 7 tables)

This paper contains 12 sections, 17 equations, 9 figures, 7 tables.

Introduction
Background and Related Works
Notations and Preliminaries
The Proposed Method
Experiments
Results for Label-skew Data Heterogeneity
Result for Feature-skew Data Heterogeneity
Performance Analysis of FedAF
Conclusion
Implementation Details
More Experiment Results with ResNet18
Communication Cost Analysis

Figures (9)

Figure 1: The conventional aggregate-then-adapt approach (a) is prone to client drift in data-heterogeneous scenarios, as clients update a downloaded global model and risk forgetting prior knowledge. In contrast, the aggregation-free paradigm (b) has the server train the global model directly using condensed synthetic data learned and shared by clients, which circumvents the client drift issue.
Figure 2: Overview of FedAF's workflow. Left: Clients download the global model $\mathbf{w}$ and the class-wise mean logits $\mathcal{V}$, averaged from $\mathcal{V}_k$ at the server. They then update the condensed data $\mathcal{S}_k$ using a combination of Distribution Matching (DM) loss and Collaborative Data Condensation (CDC) loss, with local real data $\mathcal{D}_k$ and $\mathcal{V}$ as inputs. Right: The server updates the global model $\mathbf{w}$ by employing both cross-entropy loss and Local-Global Knowledge Matching (LGKM) loss. This utilizes both condensed data $\mathcal{S}_k$ and soft labels $\mathcal{R}_k$ received from each client $k \in \{1,2,\dots, N\}$. The entire process iterates over a pre-defined number of communication rounds.
Figure 3: Comparison of convergence performance amongst baseline approaches. (a) to (c): learning curves obtained CIFAR10, (d) to (f): learning curves obtained on CIFAR100, (g) to (i): learning curves obtained on FMNIST. In addition to the the improvement in accuracy, FedAF also stands out to deliver considerably accelerated convergence speed, especially on harder dataset.
Figure 4: Comparison of convergence performance amongst baseline approaches on DomainNet dataset. FedAF also outperform the other baselines on both accuracy and convergence in feature-skew heterogeneous data distribution.
Figure 5: Impact of IPC on the learning performance of FedAF on CIFAR10. (a) to (c): the resulting learing curves. (d): improvement in model accuracy compared to FedAvg. Generally, a higher IPC correlates with enhanced performance, with all tested IPC values demonstrating improvements over FedAvg.
...and 4 more figures

An Aggregation-Free Federated Learning for Tackling Data Heterogeneity

TL;DR

Abstract

An Aggregation-Free Federated Learning for Tackling Data Heterogeneity

Authors

TL;DR

Abstract

Table of Contents

Figures (9)