Table of Contents
Fetching ...

Just rotate it! Uncertainty estimation in closed-source models via multiple queries

Konstantinos Pitas, Julyan Arbel

TL;DR

The paper tackles uncertainty estimation for closed-source image classifiers that do not expose post-softmax distributions by using multiple queries on transformed inputs to estimate p_A(x) = P_{T(x)}( f(T(x)) = A ). It derives a Gaussian latent-noise model linking p_A(x) to p(A|x,f) and demonstrates that natural transformations such as rotations yield better calibration than Gaussian perturbations, achieving notable gains in ECE and AUROC on CIFAR-10/100 and ImageNet. A transfer-learning approach learns an empirical latent-noise distribution F_n from open-source data and applies it to closed-source models via p(A|x,f) = 1/(1+exp(a F_n^{-1}(1-p_A(x)))) to further improve calibration, sometimes matching or surpassing the best natural transformations. The work provides a practical framework for obtaining calibrated uncertainty estimates from opaque models and emphasizes the importance of aligning input perturbations with latent-space noise for reliable uncertainty quantification.

Abstract

We propose a simple and effective method to estimate the uncertainty of closed-source deep neural network image classification models. Given a base image, our method creates multiple transformed versions and uses them to query the top-1 prediction of the closed-source model. We demonstrate significant improvements in the calibration of uncertainty estimates compared to the naive baseline of assigning 100\% confidence to all predictions. While we initially explore Gaussian perturbations, our empirical findings indicate that natural transformations, such as rotations and elastic deformations, yield even better-calibrated predictions. Furthermore, through empirical results and a straightforward theoretical analysis, we elucidate the reasons behind the superior performance of natural transformations over Gaussian noise. Leveraging these insights, we propose a transfer learning approach that further improves our calibration results.

Just rotate it! Uncertainty estimation in closed-source models via multiple queries

TL;DR

The paper tackles uncertainty estimation for closed-source image classifiers that do not expose post-softmax distributions by using multiple queries on transformed inputs to estimate p_A(x) = P_{T(x)}( f(T(x)) = A ). It derives a Gaussian latent-noise model linking p_A(x) to p(A|x,f) and demonstrates that natural transformations such as rotations yield better calibration than Gaussian perturbations, achieving notable gains in ECE and AUROC on CIFAR-10/100 and ImageNet. A transfer-learning approach learns an empirical latent-noise distribution F_n from open-source data and applies it to closed-source models via p(A|x,f) = 1/(1+exp(a F_n^{-1}(1-p_A(x)))) to further improve calibration, sometimes matching or surpassing the best natural transformations. The work provides a practical framework for obtaining calibrated uncertainty estimates from opaque models and emphasizes the importance of aligning input perturbations with latent-space noise for reliable uncertainty quantification.

Abstract

We propose a simple and effective method to estimate the uncertainty of closed-source deep neural network image classification models. Given a base image, our method creates multiple transformed versions and uses them to query the top-1 prediction of the closed-source model. We demonstrate significant improvements in the calibration of uncertainty estimates compared to the naive baseline of assigning 100\% confidence to all predictions. While we initially explore Gaussian perturbations, our empirical findings indicate that natural transformations, such as rotations and elastic deformations, yield even better-calibrated predictions. Furthermore, through empirical results and a straightforward theoretical analysis, we elucidate the reasons behind the superior performance of natural transformations over Gaussian noise. Leveraging these insights, we propose a transfer learning approach that further improves our calibration results.
Paper Structure (13 sections, 4 theorems, 16 equations, 5 figures, 4 tables)

This paper contains 13 sections, 4 theorems, 16 equations, 5 figures, 4 tables.

Key Result

Proposition 2.1

Let $f = g \circ h$ be a neural network used for binary classification, where $h : \mathbb{R}^{d} \rightarrow \mathbb{R}^{d_{\eta}}$ is an encoder and $g : \mathbb{R}^{d_{\eta}} \rightarrow \mathcal{Y}$ is the final classification layer, $\boldsymbol{x}$ is an input image and $\epsilon_{\tau} \sim \

Figures (5)

  • Figure 1: Rotational insights for multiple queries. Top row: Querying a closed-source image classification model only once with a base image may yield an incorrect top-1 prediction. Despite the absence of additional information, the naive baseline is to assign $100\%$ confidence to this singular prediction. Bottom row: Querying the model multiple times with augmented versions of the base image produces the $\{\mathrm{giraffe}\}$ class twice and the $\{\mathrm{gorilla}\}$ class once. This is roughly equivalent to $66.6\%$ confidence. This observation should serve as an alert to a potential error, even when the true label is unknown.
  • Figure 2: Brier score with varying number of samples. Increasing the number of samples $S$ from $10$ to $50$ improves the Brier score consistently across datasets, architectures, and transformations.
  • Figure 3: Noise distributions in latent space$\epsilon_{\eta}$. (a) For non-pathological noise distributions, the fraction of samples from the top-1 class $p_A$ is commensurate to the margin of the decision boundary. (b) Error source \ref{['error_source_1']}: If the noise distributions $\epsilon_{\eta}$ and $\epsilon_{\eta}'$ in latent space for two signals $\boldsymbol{z}$ and $\boldsymbol{z}'$ are not identical, the inferred margins become incomparable. In this instance $\boldsymbol{z}'$ has a larger margin than $\boldsymbol{z}$, yet under $\epsilon_{\eta}'$, $p_A'<p_A$. (c) Error source \ref{['error_source_2']}: Different latent noise distributions correspond to different non-linear relationships between $p_A$ and the margin. Here, $\mathrm{margin}\;z' \approx 3 \times\mathrm{margin}\;z$. For the ellipsoid latent noise, $p_A' = p_A + \mathrm{"blue \; area"}\approx p_A+1/4 \times p_A = 1.25 \times p_A$. However, for the circular latent noise, $p_A' = p_A + \mathrm{"blue \; and \; green \; area"}\approx p_A+1/2 \times p_A = 1.5 \times p_A$. Firstly, in contrast to what one might initially guess, $p_A' \neq 3 \times p_A$ for both noise distributions. Secondly, the rate $a$ at which $p_A'$ changes, $p_A' = a \times p_A$, also differs depending on the latent noise distribution. Therefore, an accurate model of $\epsilon_{\eta}$ is crucial for comparing the margins of $\boldsymbol{z}$ and $\boldsymbol{z}'$.
  • Figure 4: Empirical cumulatives$F_n$ with quantiles for CIFAR-100, ResNet20 and CIFAR-10, ResNet18. For Gaussian noise on the input signals the mean empirical cumulative $F_n$ differs the most from the cumulative of the normal $\Phi(x/a)$. Furthermore, for Gaussian noise, the empirical cumulatives $F_n$ for the different samples $\boldsymbol{x}$ exhibit the largest variability.
  • Figure 5: (a) Multiple cumulatives (b) Var statistic

Theorems & Definitions (7)

  • Proposition 2.1
  • proof : Proof Sketch
  • Proposition 4.1
  • Proposition 1.1
  • proof
  • Proposition 1.2
  • proof