FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering

Yongxin Guo; Xiaoying Tang; Tao Lin

FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering

Yongxin Guo, Xiaoying Tang, Tao Lin

TL;DR

This work addresses the challenge of diverse and simultaneous distribution shifts in Federated Learning by proposing FedRC, a soft-clustering framework built on RobustCluster. RobustCluster casts clustering as a bi-level optimization over cluster parameters $\boldsymbol{\Theta}$ and per-source weights $\boldsymbol{\Omega}$, utilizing a robust objective $\mathcal{L}(\boldsymbol{\Theta},\boldsymbol{\Omega})$ that promotes separation of concept shifts while accommodating feature and label shifts via a mixture model. FedRC adapts this centralized clustering approach to FL, enabling cluster-specific global models through an iterative E-step / M-step style optimization and FedAvg aggregation, with theoretical convergence guarantees under standard assumptions. Empirical results on FashionMNIST, CIFAR10/100, and Tiny-ImageNet across CNNs and MobileNetV2/ResNet18 demonstrate that FedRC outperforms state-of-the-art clustered FL baselines, remains robust to cluster imbalances and varying concept numbers, and can exploit adaptive enhancements (FedRC-Adam) to accelerate convergence and even infer the number of concepts. The work provides a principled framework for robust clustering in privacy-preserving, heterogeneous FL systems and lays groundwork for future extensions like adaptive concept-counting and test-time adaptation.

Abstract

Federated Learning (FL) is a machine learning paradigm that safeguards privacy by retaining client data on edge devices. However, optimizing FL in practice can be challenging due to the diverse and heterogeneous nature of the learning system. Though recent research has focused on improving the optimization of FL when distribution shifts occur among clients, ensuring global performance when multiple types of distribution shifts occur simultaneously among clients -- such as feature distribution shift, label distribution shift, and concept shift -- remain under-explored. In this paper, we identify the learning challenges posed by the simultaneous occurrence of diverse distribution shifts and propose a clustering principle to overcome these challenges. Through our research, we find that existing methods fail to address the clustering principle. Therefore, we propose a novel clustering algorithm framework, dubbed as FedRC, which adheres to our proposed clustering principle by incorporating a bi-level optimization problem and a novel objective function. Extensive experiments demonstrate that FedRC significantly outperforms other SOTA cluster-based FL methods. Our code is available at \url{https://github.com/LINs-lab/FedRC}.

FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering

TL;DR

and per-source weights

, utilizing a robust objective

that promotes separation of concept shifts while accommodating feature and label shifts via a mixture model. FedRC adapts this centralized clustering approach to FL, enabling cluster-specific global models through an iterative E-step / M-step style optimization and FedAvg aggregation, with theoretical convergence guarantees under standard assumptions. Empirical results on FashionMNIST, CIFAR10/100, and Tiny-ImageNet across CNNs and MobileNetV2/ResNet18 demonstrate that FedRC outperforms state-of-the-art clustered FL baselines, remains robust to cluster imbalances and varying concept numbers, and can exploit adaptive enhancements (FedRC-Adam) to accelerate convergence and even infer the number of concepts. The work provides a principled framework for robust clustering in privacy-preserving, heterogeneous FL systems and lays groundwork for future extensions like adaptive concept-counting and test-time adaptation.

Abstract

Paper Structure (79 sections, 6 theorems, 55 equations, 16 figures, 15 tables, 3 algorithms)

This paper contains 79 sections, 6 theorems, 55 equations, 16 figures, 15 tables, 3 algorithms.

Introduction
Distribution shifts in FL.
New challenges posed by the simultaneous occurrence of multiple types of distribution shifts.
Divide-and-Conquer as a solution.
Related Works
Federated Learning with distribution shifts.
Clustered Federated Learning.
Revisiting Clustered FL: A Diverse Distribution Shifts Perspective
Algorithms and experiment settings.
Existing clustered FL methods fail to achieve the principles of robust clustering.
Our approach: FedRC
RobustCluster: Training Robust Global Models for Each Concept
Clustering via bi-level optimization.
Objective function of RobustCluster.
Interpretation of the objective function.
...and 64 more sections

Key Result

Theorem 4.3

Assume $f_{ik}$ satisfy Assumption Smoothness assumption-Bounded Gradient Assumption, setting $T$ as the number of iterations, and $\eta \!=\! \frac{8}{40L \!+\! 9\sigma^2}$, we have, where $\mathcal{L}^{\star}$ is the upper bound of $\mathcal{L}(\boldsymbol{\Theta}, \boldsymbol{\Omega})$, and $\mathcal{L}^0 = \mathcal{L}(\boldsymbol{\Theta}^{0}, \boldsymbol{\Omega}^{0})$. Proof details refer to

Figures (16)

Figure 1: Illustration of our principles of robust clustering. Each circle represents a client, with points (features) of varying colors indicating distinct labels. Label shifts are represented by clients exhibiting data points of varying colors, as seen in clients 1 and 2. Feature shifts are exemplified by clients maintaining data points with the same color but having substantial distances between them, as observed in clients 2 and 3. Concept shifts occur when data points at the same position have different labels, as evident in clients 2 and 5. Dashed lines in different colors depict decision boundaries for classifiers of different clusters, i.e., $\boldsymbol{\theta}_1$ and $\boldsymbol{\theta}_2$. Figure \ref{['fig:illustration-single-model']} demonstrates that single-model methods are inadequate for handling concept shifts. Figure \ref{['fig:illustration-multi-model']} shows that current multi-model methods tend to overfit local distributions and can not handle unseen data, like the data points in the top-left corner of Figure \ref{['fig:illustration-multi-model']}. Our method (Figure \ref{['fig:illustration-ours']}) improves model generalization by grouping clients with concept shifts into distinct clusters, while ensuring that clients with only feature or label shifts are placed in the same clusters.
Figure 2: Performance degradation of existing clustered FL methods. Figure \ref{['fig:gains']} presents the global performance improvements of these methods and our FedRC compared to FedAvg. Figure \ref{['fig:gap']} presents the local-global performance gap of these algorithms. Figure \ref{['fig:add-prox']} illustrates the performance of clustered FL when naively combined with single-model methods, such as FedProx li2018federated and FedDecorr shi2022towards. The global distributions are label- and feature-balanced for each concept.
Figure 3: Clustering results w.r.t. classes/feature styles/concepts. After data construction, each data point $\mathbf{x}$ will have a class $y$, feature style $f$, and concept $c$. We report the percentage of data points associated with a class, feature style, or concept assigned to cluster k. For example, for a circle centered at position $(y, k)$, a larger circle size signifies that more data points with class $y$ are assigned to cluster $k$. For feature styles, we only represent $f \in [1, 10]$ here for clearer representation, and the full version can be found in Figure \ref{['fig:clustering results appendix']} of Appendix \ref{['sec:Additional Experiment Results']}. By the principles of robust clustering, we require a clustering method in which clients with the same concept are assigned to the same cluster (for example, Figure \ref{['fig:clustering results of algfed on concept']}).
Figure 4: Clustering results of FedRC w.r.t. classes/concepts. The number of clusters is selected within the range of $[3, 4]$, while keeping the remaining settings consistent with those used in Figure \ref{['fig:clustering results']}. Due to page limitations, we present the clustering results w.r.t. feature styles in Figure \ref{['fig:clustering results of algfed appendix']} of Appendix \ref{['sec:ablation study appendix']}.
Figure 5: Illustration of our numerical evaluation protocols. Clients are divided into two categories: participating clients engage in training, while nonparticipating clients are used for testing. The training and test distributions of each client are identical. Participating clients simulate real-world scenarios and may experience label, feature, and concept shifts. For example, clients 1 and 2 have different label distribution and feature styles (photo or cartoon), while clients 2 and 3 have concept shifts (labels swapped). Nonparticipating clients are utilized to test the robustness of models. Labels on nonparticipating clients are swapped in the same manner as participating clients for each concept.
...and 11 more figures

Theorems & Definitions (13)

Remark 4.1: Property of $\gamma_{i, j; k}$
Remark 4.2: Compare with existing bi-level optimization methods
Theorem 4.3: Convergence rate of RobustCluster
Theorem 1.1
proof
Lemma 2.1
proof
Lemma 2.2
proof
Theorem 2.3: Convergence rate of RobustCluster
...and 3 more

FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering

TL;DR

Abstract

FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust Clustering

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (13)