Online Bayesian Imbalanced Learning with Bregman-Calibrated Deep Networks

Zahir Alsulaimawi

Online Bayesian Imbalanced Learning with Bregman-Calibrated Deep Networks

Zahir Alsulaimawi

TL;DR

This work introduces Online Bayesian Imbalanced Learning (OBIL), a framework that decouples likelihood-ratio estimation from deployment priors to enable real-time adaptation under distribution shift without retraining. By leveraging the invariant property of the likelihood ratio and the Bregman-divergence connection to posterior calibration, OBIL trains an ensemble of Bregman-calibrated networks on associated problems and online-adjusts decision thresholds using unlabeled data, with finite-sample regret $O\left(\sqrt{T \log T}\right)$. The approach combines a theoretically grounded offline LR estimator with a robust online prior-tracking mechanism, augmented by stability and calibration checks, and demonstrates strong performance under severe prior shifts on benchmark and medical datasets. The results show that OBIL maintains robust F1 scores where traditional rebalancing or post-hoc methods fail, and provide practical guidance for calibration, hyperparameter choices, and deployment constraints. Together, these contributions advance principled, online imbalanced learning capable of handling deployment-time priors without labeled target data.

Abstract

Class imbalance remains a fundamental challenge in machine learning, where standard classifiers exhibit severe performance degradation in minority classes. Although existing approaches address imbalance through resampling or cost-sensitive learning during training, they require retraining or access to labeled target data when class distributions shift at deployment time, a common occurrence in real-world applications such as fraud detection, medical diagnosis, and anomaly detection. We present \textit{Online Bayesian Imbalanced Learning} (OBIL), a principled framework that decouples likelihood-ratio estimation from class-prior assumptions, enabling real-time adaptation to distribution shifts without model retraining. Our approach builds on the established connection between Bregman divergences and proper scoring rules to show that deep networks trained with such losses produce posterior probability estimates from which prior-invariant likelihood ratios can be extracted. We prove that these likelihood-ratio estimates remain valid under arbitrary changes in class priors and cost structures, requiring only a threshold adjustment for optimal Bayes decisions. We derive finite-sample regret bounds demonstrating that OBIL achieves $O(\sqrt{T \log T})$ regret against an oracle with perfect prior knowledge. Extensive experiments on benchmark datasets and medical diagnosis benchmarks under simulated deployment shifts demonstrate that OBIL maintains robust performance under severe distribution shifts, outperforming state-of-the-art methods in F1 Score when test distributions deviate significantly from the training conditions.

Online Bayesian Imbalanced Learning with Bregman-Calibrated Deep Networks

TL;DR

. The approach combines a theoretically grounded offline LR estimator with a robust online prior-tracking mechanism, augmented by stability and calibration checks, and demonstrates strong performance under severe prior shifts on benchmark and medical datasets. The results show that OBIL maintains robust F1 scores where traditional rebalancing or post-hoc methods fail, and provide practical guidance for calibration, hyperparameter choices, and deployment constraints. Together, these contributions advance principled, online imbalanced learning capable of handling deployment-time priors without labeled target data.

Abstract

regret against an oracle with perfect prior knowledge. Extensive experiments on benchmark datasets and medical diagnosis benchmarks under simulated deployment shifts demonstrate that OBIL maintains robust performance under severe distribution shifts, outperforming state-of-the-art methods in F1 Score when test distributions deviate significantly from the training conditions.

Paper Structure (79 sections, 17 theorems, 66 equations, 6 figures, 11 tables, 2 algorithms)

This paper contains 79 sections, 17 theorems, 66 equations, 6 figures, 11 tables, 2 algorithms.

Introduction
The Bayesian Perspective: Likelihood Ratios as Invariant Quantities
Bregman Divergences and Posterior Probability Estimation
Contributions
Paper Organization
Related Work
Class Imbalance in Machine Learning
Data-Level Methods
Algorithm-Level Methods
Ensemble Methods
Distribution Shift and Domain Adaptation
Likelihood Ratio Estimation
Bayesian Approaches to Imbalanced Learning
Logit Adjustment and Post-Hoc Correction
Preliminaries
...and 64 more sections

Key Result

Theorem 3

A loss function $C(o, t)$ is a proper scoring rule if and only if it can be written as a Bregman divergence (plus terms independent of $o$). Equivalently, $C$ is proper if and only if: for some function $g(o) > 0$.

Figures (6)

Figure 1: F1-score across the full range of test imbalance ratios on (a) Yeast4 and (b) Mammography. OBIL's advantage over baselines grows with shift severity. Error bars show $\pm 1$ standard deviation over 10 runs.
Figure 2: Online adaptation under abrupt prior shift at $t = 500$. (a) True vs. estimated minority prior with 80% confidence band. (b) Corresponding threshold tracking. (c) Cumulative F1-score comparing OBIL, oracle, BBSE, and SMOTE. Shaded regions show 10th--90th percentile over 10 runs.
Figure 3: Calibration and likelihood ratio quality on Mammography. (a) Reliability diagrams for three loss configurations. (b) Log-likelihood ratio estimates vs. reference values; Bregman-calibrated estimates show tight scatter while uncalibrated estimates exhibit large dispersion at extremes.
Figure 4: Error propagation from calibration error to likelihood ratio error (Theorem \ref{['thm:error_prop']}). (a) Relative LR error bound vs. true posterior for several $\epsilon$ values. Dotted line marks the 15% threshold corresponding to ECE $< 0.05$. (b) Full error landscape with contour lines at key thresholds.
Figure 5: Empirical regret under non-stationary priors ($\delta = 0.002$, $T = 2000$). (a) Cumulative regret; OBIL tracks the $O(\sqrt{T \log T})$ bound while non-adaptive methods grow linearly. (b) Per-step regret (50-step moving average); OBIL converges toward zero. Shaded regions show 20th--80th percentile over 15 runs.
...and 1 more figures

Theorems & Definitions (39)

Definition 1: Bregman Divergence
Definition 2: Proper Scoring Rule
Theorem 3: Savage, 1971; Cid-Sueiro et al., 1999
Definition 4: Associated Problem
Theorem 5: Transferability of Likelihood Ratio Estimates
proof
Definition 6: Bregman-Calibrated Network
Theorem 7: Characterization of Bregman-Calibrated Losses
proof
Corollary 8: Architecture Independence
...and 29 more

Online Bayesian Imbalanced Learning with Bregman-Calibrated Deep Networks

TL;DR

Abstract

Online Bayesian Imbalanced Learning with Bregman-Calibrated Deep Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (39)