Table of Contents
Fetching ...

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, Bo An

TL;DR

This work first study the relationship between logits and generalization performance from the view of low-density separation assumption and proposes the proposed method MaNo, which applies a data-dependent normalization on the logits to reduce prediction bias and takes the $L_p$ norm of the matrix of normalized logits as the estimation score.

Abstract

Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift. In this work, we first study the relationship between logits and generalization performance from the view of low-density separation assumption. Our findings motivate our proposed method MaNo which (1) applies a data-dependent normalization on the logits to reduce prediction bias, and (2) takes the $L_p$ norm of the matrix of normalized logits as the estimation score. Our theoretical analysis highlights the connection between the provided score and the model's uncertainty. We conduct an extensive empirical study on common unsupervised accuracy estimation benchmarks and demonstrate that MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts. The code is available at \url{https://github.com/Renchunzi-Xie/MaNo}.

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

TL;DR

This work first study the relationship between logits and generalization performance from the view of low-density separation assumption and proposes the proposed method MaNo, which applies a data-dependent normalization on the logits to reduce prediction bias and takes the norm of the matrix of normalized logits as the estimation score.

Abstract

Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift. In this work, we first study the relationship between logits and generalization performance from the view of low-density separation assumption. Our findings motivate our proposed method MaNo which (1) applies a data-dependent normalization on the logits to reduce prediction bias, and (2) takes the norm of the matrix of normalized logits as the estimation score. Our theoretical analysis highlights the connection between the provided score and the model's uncertainty. We conduct an extensive empirical study on common unsupervised accuracy estimation benchmarks and demonstrate that MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts. The code is available at \url{https://github.com/Renchunzi-Xie/MaNo}.
Paper Structure (72 sections, 31 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 72 sections, 31 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Illustration of the LDS assumption. When the boundary passes through dense regions (a), margins have little predictive power and cannot be used without labels. On the contrary, margins are informative in sparse regions (b).
  • Figure 2: Empirical evidence with Resnet18.(a) The model is well-calibrated on Office-Home and miscalibrated on PACS. (b) softrun is superior to the state-of-the-art Nuclear deng2023confidence in all scenarios while the softmax heavily fails on PACS. (c) Increasing the approximation order $n$ in Eq. \ref{['eq:taylor']} is detrimental on PACS and beneficial on Office-Home. The optimal trade-off in all calibration scenarios is taking $n \in \{2, 3\}$.
  • Figure 3: OOD error prediction versus ground-truth error on Entity-13 with ResNet18. This scatter plot compares MaNo with Dispersion Score and ProjNorm. Each point represents one dataset under a specific type and severity of corruption. Different shapes indicate different types of corruption, while darker colors indicate higher severity levels. This indicates the qualitative superiority of MaNo.
  • Figure 4: $R^2$ distribution with ResNet18 on all distribution shifts. Overall, MaNo leads to the best and most robust estimations.
  • Figure 5: $R^2$ distribution using ResNets (average), ConvNext, or ViT on all distribution shifts. Again, MaNo is the best method.
  • ...and 6 more figures

Theorems & Definitions (7)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof