Bayesian-based Online Label Shift Estimation with Dynamic Dirichlet Priors
Jiawei Hu, Javier A. Barria
TL;DR
This work tackles label shift by modeling test priors with a Dirichlet distribution and solving for priors and hyperparameters through EM-based optimization. It introduces FMAPLS, a batch Bayesian framework that jointly updates π and α, and its online variant online-FMAPLS for streaming data, augmented by a linear surrogate for efficient α updates. Theoretical analysis includes CRB and a convergence-rate trade-off tied to the surrogate parameter c. Empirical results on CIFAR100 and ImageNet show substantial reductions in KL divergence and improved post-shift accuracy, especially under extreme class imbalance and uncertain priors, highlighting the methods' robustness and scalability.
Abstract
Label shift, a prevalent challenge in supervised learning, arises when the class prior distribution of test data differs from that of training data, leading to significant degradation in classifier performance. To accurately estimate the test priors and enhance classification accuracy, we propose a Bayesian framework for label shift estimation, termed Full Maximum A Posterior Label Shift (FMAPLS), along with its online version, online-FMAPLS. Leveraging batch and online Expectation-Maximization (EM) algorithms, these methods jointly and dynamically optimize Dirichlet hyperparameters $\boldsymbolα$ and class priors $\boldsymbolπ$, thereby overcoming the rigid constraints of the existing Maximum A Posterior Label Shift (MAPLS) approach. Moreover, we introduce a linear surrogate function (LSF) to replace gradient-based hyperparameter updates, yielding closed-form solutions that reduce computational complexity while retaining asymptotic equivalence. The online variant substitutes the batch E-step with a stochastic approximation, enabling real-time adaptation to streaming data. Furthermore, our theoretical analysis reveals a fundamental trade-off between online convergence rate and estimation accuracy. Extensive experiments on CIFAR100 and ImageNet datasets under shuffled long-tail and Dirichlet test priors demonstrate that FMAPLS and online-FMAPLS respectively achieve up to 40% and 12% lower KL divergence and substantial improvements in post-shift accuracy over state-of-the-art baselines, particularly under severe class imbalance and distributional uncertainty. These results confirm the robustness, scalability, and suitability of the proposed methods for large-scale and dynamic learning scenarios.
