Table of Contents
Fetching ...

Bayesian-based Online Label Shift Estimation with Dynamic Dirichlet Priors

Jiawei Hu, Javier A. Barria

TL;DR

This work tackles label shift by modeling test priors with a Dirichlet distribution and solving for priors and hyperparameters through EM-based optimization. It introduces FMAPLS, a batch Bayesian framework that jointly updates π and α, and its online variant online-FMAPLS for streaming data, augmented by a linear surrogate for efficient α updates. Theoretical analysis includes CRB and a convergence-rate trade-off tied to the surrogate parameter c. Empirical results on CIFAR100 and ImageNet show substantial reductions in KL divergence and improved post-shift accuracy, especially under extreme class imbalance and uncertain priors, highlighting the methods' robustness and scalability.

Abstract

Label shift, a prevalent challenge in supervised learning, arises when the class prior distribution of test data differs from that of training data, leading to significant degradation in classifier performance. To accurately estimate the test priors and enhance classification accuracy, we propose a Bayesian framework for label shift estimation, termed Full Maximum A Posterior Label Shift (FMAPLS), along with its online version, online-FMAPLS. Leveraging batch and online Expectation-Maximization (EM) algorithms, these methods jointly and dynamically optimize Dirichlet hyperparameters $\boldsymbolα$ and class priors $\boldsymbolπ$, thereby overcoming the rigid constraints of the existing Maximum A Posterior Label Shift (MAPLS) approach. Moreover, we introduce a linear surrogate function (LSF) to replace gradient-based hyperparameter updates, yielding closed-form solutions that reduce computational complexity while retaining asymptotic equivalence. The online variant substitutes the batch E-step with a stochastic approximation, enabling real-time adaptation to streaming data. Furthermore, our theoretical analysis reveals a fundamental trade-off between online convergence rate and estimation accuracy. Extensive experiments on CIFAR100 and ImageNet datasets under shuffled long-tail and Dirichlet test priors demonstrate that FMAPLS and online-FMAPLS respectively achieve up to 40% and 12% lower KL divergence and substantial improvements in post-shift accuracy over state-of-the-art baselines, particularly under severe class imbalance and distributional uncertainty. These results confirm the robustness, scalability, and suitability of the proposed methods for large-scale and dynamic learning scenarios.

Bayesian-based Online Label Shift Estimation with Dynamic Dirichlet Priors

TL;DR

This work tackles label shift by modeling test priors with a Dirichlet distribution and solving for priors and hyperparameters through EM-based optimization. It introduces FMAPLS, a batch Bayesian framework that jointly updates π and α, and its online variant online-FMAPLS for streaming data, augmented by a linear surrogate for efficient α updates. Theoretical analysis includes CRB and a convergence-rate trade-off tied to the surrogate parameter c. Empirical results on CIFAR100 and ImageNet show substantial reductions in KL divergence and improved post-shift accuracy, especially under extreme class imbalance and uncertain priors, highlighting the methods' robustness and scalability.

Abstract

Label shift, a prevalent challenge in supervised learning, arises when the class prior distribution of test data differs from that of training data, leading to significant degradation in classifier performance. To accurately estimate the test priors and enhance classification accuracy, we propose a Bayesian framework for label shift estimation, termed Full Maximum A Posterior Label Shift (FMAPLS), along with its online version, online-FMAPLS. Leveraging batch and online Expectation-Maximization (EM) algorithms, these methods jointly and dynamically optimize Dirichlet hyperparameters and class priors , thereby overcoming the rigid constraints of the existing Maximum A Posterior Label Shift (MAPLS) approach. Moreover, we introduce a linear surrogate function (LSF) to replace gradient-based hyperparameter updates, yielding closed-form solutions that reduce computational complexity while retaining asymptotic equivalence. The online variant substitutes the batch E-step with a stochastic approximation, enabling real-time adaptation to streaming data. Furthermore, our theoretical analysis reveals a fundamental trade-off between online convergence rate and estimation accuracy. Extensive experiments on CIFAR100 and ImageNet datasets under shuffled long-tail and Dirichlet test priors demonstrate that FMAPLS and online-FMAPLS respectively achieve up to 40% and 12% lower KL divergence and substantial improvements in post-shift accuracy over state-of-the-art baselines, particularly under severe class imbalance and distributional uncertainty. These results confirm the robustness, scalability, and suitability of the proposed methods for large-scale and dynamic learning scenarios.

Paper Structure

This paper contains 26 sections, 53 equations, 6 figures, 5 tables, 2 algorithms.

Figures (6)

  • Figure 1: System architecture of the proposed Bayesian-based label shift estimation framework. The offline algorithm (FMAPLS) and its online variant (online-FMAPLS) estimate the class prior distribution of the test domain using high-dimensional Dirichlet modeling and EM-based inference, thus supporting both batch and streaming data processing.
  • Figure 2: KL divergence of online-FMAPLS algorithm evaluated on CIFAR100 dataset with varying shuffled long-tail test imbalance ratio $\rho_{\text{test}}$ and different training imbalance ratios $\rho_{\text{train}}$. (a) $\rho_{\text{train}}=5$, (b) $\rho_{\text{train}}=10$, (c) $\rho_{\text{train}}=20$, (d) $\rho_{\text{train}}=50$.
  • Figure 3: KL divergence of online-FMAPLS algorithm evaluated on CIFAR100 dataset with varying test prior Dirichlet hyperparameter $\bm{\alpha}$ and different training imbalance ratios $\rho_{\text{train}}$. (a) $\rho_{\text{train}}=5$, (b) $\rho_{\text{train}}=10$, (c) $\rho_{\text{train}}=20$, (d) $\rho_{\text{train}}=50$.
  • Figure 4: KL divergence of online-FMAPLS algorithm evaluated on long-tail ImageNet dataset with (a) varying shuffled long-tail test imbalance ratio $\rho_{\text{test}}$, (b) varying test Dirichlet hyperparameter $\bm{\alpha}$.
  • Figure 5: Ablation comparison of FMAPLS, online FMAPLS, and MAPLS under different training prior distributions. (a) Shuffled long-tail imbalance ratio $\rho_{\text{test}}=0.02$, (b) Test Dirichlet hyperparameter $\bm{\alpha}_{\text{test}}=\bm{1}$.
  • ...and 1 more figures