Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

Yujun Shi; Jian Liang; Wenqing Zhang; Vincent Y. F. Tan; Song Bai

Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

Yujun Shi, Jian Liang, Wenqing Zhang, Vincent Y. F. Tan, Song Bai

TL;DR

Dimensional collapse of representations under data heterogeneity is identified as a core challenge in federated learning, affecting both local and global models. The authors provide a gradient-flow-based theoretical explanation showing how heterogeneous label distributions bias weight matrices toward low rank, leading to collapsed representations, and demonstrate that the global collapse inherits from local models. They propose FedDecorr, a plug-and-play regularizer that decorrelates representations by penalizing the correlation matrix via a Frobenius-norm term, implemented efficiently with z-score normalization. Empirical results on CIFAR-10/100 and TinyImageNet show consistent, notable gains over strong baselines, especially under strong heterogeneity, highlighting FedDecorr’s practical value for scalable, robust FL deployments.

Abstract

Federated learning aims to train models collaboratively across different clients without the sharing of data for privacy considerations. However, one major challenge for this learning paradigm is the {\em data heterogeneity} problem, which refers to the discrepancies between the local data distributions among various clients. To tackle this problem, we first study how data heterogeneity affects the representations of the globally aggregated models. Interestingly, we find that heterogeneous data results in the global model suffering from severe {\em dimensional collapse}, in which representations tend to reside in a lower-dimensional space instead of the ambient space. Moreover, we observe a similar phenomenon on models locally trained on each client and deduce that the dimensional collapse on the global model is inherited from local models. In addition, we theoretically analyze the gradient flow dynamics to shed light on how data heterogeneity result in dimensional collapse for local models. To remedy this problem caused by the data heterogeneity, we propose {\sc FedDecorr}, a novel method that can effectively mitigate dimensional collapse in federated learning. Specifically, {\sc FedDecorr} applies a regularization term during local training that encourages different dimensions of representations to be uncorrelated. {\sc FedDecorr}, which is implementation-friendly and computationally-efficient, yields consistent improvements over baselines on standard benchmark datasets. Code: https://github.com/bytedance/FedDecorr.

Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

TL;DR

Abstract

Paper Structure (37 sections, 4 theorems, 52 equations, 11 figures, 10 tables, 1 algorithm)

This paper contains 37 sections, 4 theorems, 52 equations, 11 figures, 10 tables, 1 algorithm.

Introduction
Related Works
Dimensional Collapse Caused by Data Heterogeneity
Empirical Observations on the Global Model
Empirical Observations on Local Models
A Theoretical Explanation for Dimensional Collapse
Setups and Notations
Analysis on Gradient Flow Dynamics
Mitigating Dimensional Collapse with FedDecorr
Experiments
Experimental Setups
FedDecorr Significantly Improves Baseline Methods
Ablation Study on the Number of Clients
Ablation Study on the Regularization Coefficient $\beta$
Ablation Study on the Number of Local Epochs
...and 22 more sections

Key Result

Theorem 1

Assuming that the mild conditions as stated in Appendix sec:assumptions hold. Let $\sigma_k(t)$ for $k \in [d]$ be the $k$-th largest singular value of $\Pi(t)$. Then, where $\mathbf{u}_{L+1, k} (t)$ is the $k$-th left singular vector of $W_{L+1}(t)$, $\mathbf{v}_{k}(t)$ is the $k$-th right singular vector of $\Pi(t)$, $M$ is a constant, and $G(t)$ is defined as where $\mu_{c}$, $\mathbf{e}_{c}$

Figures (11)

Figure 1: (a) illustrates data heterogeneity in terms of number of samples per class. (b), (c), (d) show representations (normalized to the unit sphere) of global models trained under homogeneous data, heterogeneous data, and heterogeneous data with FedDecorr, respectively. Only (c) suffers dimensional collapse. (b), (c), (d) are produced with ResNet20 on CIFAR10. Best viewed in color.
Figure 2: Data heterogeneity causes dimensional collapse on (a) global models and (b) local models. We plot the singular values of the covariance matrix of representations in descending order. The $x$-axis ($k$) is the index of singular values and the $y$-axis is the logarithm of the singular values.
Figure 3: FedDecorr effectively mitigates dimensional collapse for (a-b) local models and (c-d) global models. For each heterogeneity parameter $\alpha \in \{0.01,0.05\}$, we apply FedDecorr and plot the singular values of the representation covariance matrix. The $x$-axis ($k$) is the index of singular values. With FedDecorr, the tail singular values are prevented from dropping to $0$ too rapidly.
Figure 4: Test accuracy (%) at each communication round. Results are averaged over 3 runs. Shaded areas denote one standard deviation above and below the mean.
Figure 5: Ablation study on $\beta$. We apply FedDecorr with different choices of $\beta$ on FedAvg.
...and 6 more figures

Theorems & Definitions (8)

Theorem 1: Informal
Proposition 1
Lemma 1
proof
Lemma 2
proof
proof
proof

Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

TL;DR

Abstract

Towards Understanding and Mitigating Dimensional Collapse in Heterogeneous Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (8)