Just rotate it! Uncertainty estimation in closed-source models via multiple queries

Konstantinos Pitas; Julyan Arbel

Just rotate it! Uncertainty estimation in closed-source models via multiple queries

Konstantinos Pitas, Julyan Arbel

TL;DR

The paper tackles uncertainty estimation for closed-source image classifiers that do not expose post-softmax distributions by using multiple queries on transformed inputs to estimate p_A(x) = P_{T(x)}( f(T(x)) = A ). It derives a Gaussian latent-noise model linking p_A(x) to p(A|x,f) and demonstrates that natural transformations such as rotations yield better calibration than Gaussian perturbations, achieving notable gains in ECE and AUROC on CIFAR-10/100 and ImageNet. A transfer-learning approach learns an empirical latent-noise distribution F_n from open-source data and applies it to closed-source models via p(A|x,f) = 1/(1+exp(a F_n^{-1}(1-p_A(x)))) to further improve calibration, sometimes matching or surpassing the best natural transformations. The work provides a practical framework for obtaining calibrated uncertainty estimates from opaque models and emphasizes the importance of aligning input perturbations with latent-space noise for reliable uncertainty quantification.

Abstract

We propose a simple and effective method to estimate the uncertainty of closed-source deep neural network image classification models. Given a base image, our method creates multiple transformed versions and uses them to query the top-1 prediction of the closed-source model. We demonstrate significant improvements in the calibration of uncertainty estimates compared to the naive baseline of assigning 100\% confidence to all predictions. While we initially explore Gaussian perturbations, our empirical findings indicate that natural transformations, such as rotations and elastic deformations, yield even better-calibrated predictions. Furthermore, through empirical results and a straightforward theoretical analysis, we elucidate the reasons behind the superior performance of natural transformations over Gaussian noise. Leveraging these insights, we propose a transfer learning approach that further improves our calibration results.

Just rotate it! Uncertainty estimation in closed-source models via multiple queries

TL;DR

Abstract

Paper Structure (13 sections, 4 theorems, 16 equations, 5 figures, 4 tables)

This paper contains 13 sections, 4 theorems, 16 equations, 5 figures, 4 tables.

Introduction
Categorical distribution via multiple queries
Gaussian model.
Gaussian model experiments
Two sources of error and the transfer learning model
Transfer Learning Model.
Transfer learning model experiments
Related work
Discussion
Additional proofs
Statistical correlations between Var, KS and calibration metrics
Experimental setup and hyperparameter ranges
Graphical illustration of the two test statistics (KS,Var)

Key Result

Proposition 2.1

Let $f = g \circ h$ be a neural network used for binary classification, where $h : \mathbb{R}^{d} \rightarrow \mathbb{R}^{d_{\eta}}$ is an encoder and $g : \mathbb{R}^{d_{\eta}} \rightarrow \mathcal{Y}$ is the final classification layer, $\boldsymbol{x}$ is an input image and $\epsilon_{\tau} \sim \

Figures (5)

Figure 1: Rotational insights for multiple queries. Top row: Querying a closed-source image classification model only once with a base image may yield an incorrect top-1 prediction. Despite the absence of additional information, the naive baseline is to assign $100\%$ confidence to this singular prediction. Bottom row: Querying the model multiple times with augmented versions of the base image produces the $\{\mathrm{giraffe}\}$ class twice and the $\{\mathrm{gorilla}\}$ class once. This is roughly equivalent to $66.6\%$ confidence. This observation should serve as an alert to a potential error, even when the true label is unknown.
Figure 2: Brier score with varying number of samples. Increasing the number of samples $S$ from $10$ to $50$ improves the Brier score consistently across datasets, architectures, and transformations.
Figure 3: Noise distributions in latent space$\epsilon_{\eta}$. (a) For non-pathological noise distributions, the fraction of samples from the top-1 class $p_A$ is commensurate to the margin of the decision boundary. (b) Error source \ref{['error_source_1']}: If the noise distributions $\epsilon_{\eta}$ and $\epsilon_{\eta}'$ in latent space for two signals $\boldsymbol{z}$ and $\boldsymbol{z}'$ are not identical, the inferred margins become incomparable. In this instance $\boldsymbol{z}'$ has a larger margin than $\boldsymbol{z}$, yet under $\epsilon_{\eta}'$, $p_A'<p_A$. (c) Error source \ref{['error_source_2']}: Different latent noise distributions correspond to different non-linear relationships between $p_A$ and the margin. Here, $\mathrm{margin}\;z' \approx 3 \times\mathrm{margin}\;z$. For the ellipsoid latent noise, $p_A' = p_A + \mathrm{"blue \; area"}\approx p_A+1/4 \times p_A = 1.25 \times p_A$. However, for the circular latent noise, $p_A' = p_A + \mathrm{"blue \; and \; green \; area"}\approx p_A+1/2 \times p_A = 1.5 \times p_A$. Firstly, in contrast to what one might initially guess, $p_A' \neq 3 \times p_A$ for both noise distributions. Secondly, the rate $a$ at which $p_A'$ changes, $p_A' = a \times p_A$, also differs depending on the latent noise distribution. Therefore, an accurate model of $\epsilon_{\eta}$ is crucial for comparing the margins of $\boldsymbol{z}$ and $\boldsymbol{z}'$.
Figure 4: Empirical cumulatives$F_n$ with quantiles for CIFAR-100, ResNet20 and CIFAR-10, ResNet18. For Gaussian noise on the input signals the mean empirical cumulative $F_n$ differs the most from the cumulative of the normal $\Phi(x/a)$. Furthermore, for Gaussian noise, the empirical cumulatives $F_n$ for the different samples $\boldsymbol{x}$ exhibit the largest variability.
Figure 5: (a) Multiple cumulatives (b) Var statistic

Theorems & Definitions (7)

Proposition 2.1
proof : Proof Sketch
Proposition 4.1
Proposition 1.1
proof
Proposition 1.2
proof

Just rotate it! Uncertainty estimation in closed-source models via multiple queries

TL;DR

Abstract

Just rotate it! Uncertainty estimation in closed-source models via multiple queries

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (7)