Addressing Label Shift in Distributed Learning via Entropy Regularization

Zhiyuan Wu; Changkyu Choi; Xiangcheng Cao; Volkan Cevher; Ali Ramezani-Kebrya

Addressing Label Shift in Distributed Learning via Entropy Regularization

Zhiyuan Wu, Changkyu Choi, Xiangcheng Cao, Volkan Cevher, Ali Ramezani-Kebrya

TL;DR

The paper addresses label shift in distributed learning where data stays on client nodes and proposes VRLS, an entropy-regularized density-ratio estimator, integrated into an IW-ERM framework to counter both intra- and inter-node shifts. VRLS improves calibration of the estimated $p^{\text{tr}}(\boldsymbol{y}|\boldsymbol{x})$, enabling more accurate density ratios $p^{\text{te}}(\boldsymbol{y})/p^{\text{tr}}(\boldsymbol{y})$ and near-optimal global risk in multi-node settings. The authors provide finite-sample error bounds for the ratio estimates and convergence guarantees for IW-ERM under convex, smooth, and nonconvex regimes, while preserving privacy and minimizing communication. Empirically, VRLS-based IW-ERM achieves up to 20% improvements in test error in imbalanced label-shift scenarios on MNIST, Fashion-MNIST, and CIFAR-10, and scales to 5–200 nodes with results approaching an upper bound that uses true density ratios, highlighting practical impact for robust distributed learning.

Abstract

We address the challenge of minimizing true risk in multi-node distributed learning. These systems are frequently exposed to both inter-node and intra-node label shifts, which present a critical obstacle to effectively optimizing model performance while ensuring that data remains confined to each node. To tackle this, we propose the Versatile Robust Label Shift (VRLS) method, which enhances the maximum likelihood estimation of the test-to-train label density ratio. VRLS incorporates Shannon entropy-based regularization and adjusts the density ratio during training to better handle label shifts at the test time. In multi-node learning environments, VRLS further extends its capabilities by learning and adapting density ratios across nodes, effectively mitigating label shifts and improving overall model performance. Experiments conducted on MNIST, Fashion MNIST, and CIFAR-10 demonstrate the effectiveness of VRLS, outperforming baselines by up to 20% in imbalanced settings. These results highlight the significant improvements VRLS offers in addressing label shifts. Our theoretical analysis further supports this by establishing high-probability bounds on estimation errors.

Addressing Label Shift in Distributed Learning via Entropy Regularization

TL;DR

Abstract

Addressing Label Shift in Distributed Learning via Entropy Regularization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (29)