Table of Contents
Fetching ...

Graph-Smoothed Bayesian Black-Box Shift Estimator and Its Information Geometry

Masanari Kimura

TL;DR

GS-B$^3$SE addresses label-shift estimation by replacing the brittle plug-in BBSE with a fully Bayesian model that ties the target prior $\bm{q}$ and each confusion-matrix column $\bm{C}_{:,i}$ on a label-similarity graph via a Laplacian-GMRF prior. The authors prove posterior identifiability and an $N^{-1/2}$ contraction rate, with class-wise variances shrinking as $1/(\lambda_2(\bm{L})N)$, and they establish robustness to graph misspecification. An information-geometric interpretation shows the estimator as a geodesically convex penalized likelihood on the $K$-simplex, decomposing into a data-fit term and a graph-regularizer controlled by the algebraic connectivity $\lambda_2(\bm{L})$. Empirically, GS-B$^3$SE yields sharper priors and improved downstream Saerens-corrected accuracy on MNIST, CIFAR-10, and CIFAR-100, while requiring only a frozen classifier, a small validation set, and a precomputed label graph. The framework thus provides calibrated uncertainty and practical post-processing for deployment scenarios with label drift and limited labeled data.

Abstract

Label shift adaptation aims to recover target class priors when the labelled source distribution $P$ and the unlabelled target distribution $Q$ share $P(X \mid Y) = Q(X \mid Y)$ but $P(Y) \neq Q(Y)$. Classical black-box shift estimators invert an empirical confusion matrix of a frozen classifier, producing a brittle point estimate that ignores sampling noise and similarity among classes. We present Graph-Smoothed Bayesian BBSE (GS-B$^3$SE), a fully probabilistic alternative that places Laplacian-Gaussian priors on both target log-priors and confusion-matrix columns, tying them together on a label-similarity graph. The resulting posterior is tractable with HMC or a fast block Newton-CG scheme. We prove identifiability, $N^{-1/2}$ contraction, variance bounds that shrink with the graph's algebraic connectivity, and robustness to Laplacian misspecification. We also reinterpret GS-B$^3$SE through information geometry, showing that it generalizes existing shift estimators.

Graph-Smoothed Bayesian Black-Box Shift Estimator and Its Information Geometry

TL;DR

GS-BSE addresses label-shift estimation by replacing the brittle plug-in BBSE with a fully Bayesian model that ties the target prior and each confusion-matrix column on a label-similarity graph via a Laplacian-GMRF prior. The authors prove posterior identifiability and an contraction rate, with class-wise variances shrinking as , and they establish robustness to graph misspecification. An information-geometric interpretation shows the estimator as a geodesically convex penalized likelihood on the -simplex, decomposing into a data-fit term and a graph-regularizer controlled by the algebraic connectivity . Empirically, GS-BSE yields sharper priors and improved downstream Saerens-corrected accuracy on MNIST, CIFAR-10, and CIFAR-100, while requiring only a frozen classifier, a small validation set, and a precomputed label graph. The framework thus provides calibrated uncertainty and practical post-processing for deployment scenarios with label drift and limited labeled data.

Abstract

Label shift adaptation aims to recover target class priors when the labelled source distribution and the unlabelled target distribution share but . Classical black-box shift estimators invert an empirical confusion matrix of a frozen classifier, producing a brittle point estimate that ignores sampling noise and similarity among classes. We present Graph-Smoothed Bayesian BBSE (GS-BSE), a fully probabilistic alternative that places Laplacian-Gaussian priors on both target log-priors and confusion-matrix columns, tying them together on a label-similarity graph. The resulting posterior is tractable with HMC or a fast block Newton-CG scheme. We prove identifiability, contraction, variance bounds that shrink with the graph's algebraic connectivity, and robustness to Laplacian misspecification. We also reinterpret GS-BSE through information geometry, showing that it generalizes existing shift estimators.

Paper Structure

This paper contains 19 sections, 8 theorems, 105 equations, 4 tables.

Key Result

Lemma 1

Let $\bm{C}$ and $\bm{C}'$ be two column-stochastic matrices with strictly positive entries: $C_{j,i} > 0$, $C'_{j,i} > 0$, $\sum^K_{j=1}C_{j,i} = \sum^K_{j=1}C'_{j,i} = 1$ for $1 \leq i \leq K$. In addition, assume $\bm{C}$ and $\bm{C}'$ are invertible, or equivalently, $\det \bm{C} \neq 0$ and $\d Suppose that, for every choice of the sample sizes $\{n^S_{i=1}\}^K_{i=1}$ and $n'$, $\left(\{\bm{N

Theorems & Definitions (16)

  • Lemma 1
  • Lemma 2
  • Proposition 1
  • Theorem 1
  • Corollary 1
  • Proposition 2
  • Theorem 2: Geodesic convexity of $F$
  • Proposition 3: Natural–gradient flow of the penalised objective
  • proof : Proof for Lemma \ref{['lem:identifiability']}
  • proof : Proof for Lemma \ref{['lem:support_condition']}
  • ...and 6 more