Graph-Smoothed Bayesian Black-Box Shift Estimator and Its Information Geometry
Masanari Kimura
TL;DR
GS-B$^3$SE addresses label-shift estimation by replacing the brittle plug-in BBSE with a fully Bayesian model that ties the target prior $\bm{q}$ and each confusion-matrix column $\bm{C}_{:,i}$ on a label-similarity graph via a Laplacian-GMRF prior. The authors prove posterior identifiability and an $N^{-1/2}$ contraction rate, with class-wise variances shrinking as $1/(\lambda_2(\bm{L})N)$, and they establish robustness to graph misspecification. An information-geometric interpretation shows the estimator as a geodesically convex penalized likelihood on the $K$-simplex, decomposing into a data-fit term and a graph-regularizer controlled by the algebraic connectivity $\lambda_2(\bm{L})$. Empirically, GS-B$^3$SE yields sharper priors and improved downstream Saerens-corrected accuracy on MNIST, CIFAR-10, and CIFAR-100, while requiring only a frozen classifier, a small validation set, and a precomputed label graph. The framework thus provides calibrated uncertainty and practical post-processing for deployment scenarios with label drift and limited labeled data.
Abstract
Label shift adaptation aims to recover target class priors when the labelled source distribution $P$ and the unlabelled target distribution $Q$ share $P(X \mid Y) = Q(X \mid Y)$ but $P(Y) \neq Q(Y)$. Classical black-box shift estimators invert an empirical confusion matrix of a frozen classifier, producing a brittle point estimate that ignores sampling noise and similarity among classes. We present Graph-Smoothed Bayesian BBSE (GS-B$^3$SE), a fully probabilistic alternative that places Laplacian-Gaussian priors on both target log-priors and confusion-matrix columns, tying them together on a label-similarity graph. The resulting posterior is tractable with HMC or a fast block Newton-CG scheme. We prove identifiability, $N^{-1/2}$ contraction, variance bounds that shrink with the graph's algebraic connectivity, and robustness to Laplacian misspecification. We also reinterpret GS-B$^3$SE through information geometry, showing that it generalizes existing shift estimators.
