MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

Renchunzi Xie; Ambroise Odonnat; Vasilii Feofanov; Weijian Deng; Jianfeng Zhang; Bo An

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

Renchunzi Xie, Ambroise Odonnat, Vasilii Feofanov, Weijian Deng, Jianfeng Zhang, Bo An

TL;DR

This work first study the relationship between logits and generalization performance from the view of low-density separation assumption and proposes the proposed method MaNo, which applies a data-dependent normalization on the logits to reduce prediction bias and takes the $L_p$ norm of the matrix of normalized logits as the estimation score.

Abstract

Leveraging the models' outputs, specifically the logits, is a common approach to estimating the test accuracy of a pre-trained neural network on out-of-distribution (OOD) samples without requiring access to the corresponding ground truth labels. Despite their ease of implementation and computational efficiency, current logit-based methods are vulnerable to overconfidence issues, leading to prediction bias, especially under the natural shift. In this work, we first study the relationship between logits and generalization performance from the view of low-density separation assumption. Our findings motivate our proposed method MaNo which (1) applies a data-dependent normalization on the logits to reduce prediction bias, and (2) takes the $L_p$ norm of the matrix of normalized logits as the estimation score. Our theoretical analysis highlights the connection between the provided score and the model's uncertainty. We conduct an extensive empirical study on common unsupervised accuracy estimation benchmarks and demonstrate that MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts. The code is available at \url{https://github.com/Renchunzi-Xie/MaNo}.

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

TL;DR

norm of the matrix of normalized logits as the estimation score.

Abstract

norm of the matrix of normalized logits as the estimation score. Our theoretical analysis highlights the connection between the provided score and the model's uncertainty. We conduct an extensive empirical study on common unsupervised accuracy estimation benchmarks and demonstrate that MaNo achieves state-of-the-art performance across various architectures in the presence of synthetic, natural, or subpopulation shifts. The code is available at \url{https://github.com/Renchunzi-Xie/MaNo}.

Paper Structure (72 sections, 31 equations, 11 figures, 7 tables, 1 algorithm)

This paper contains 72 sections, 31 equations, 11 figures, 7 tables, 1 algorithm.

Introduction
Summary of our contributions.
Problem Statement
Setting.
Unsupervised accuracy estimation.
What Explains the Correlation between Logits and Test Accuracy?
Motivation
Logits reflect the distances to decision boundaries.
Low-density separation assumption.
Assumptions on the prediction bias.
MaNo: Predicting Generalization Performance With Matrix Norm of Logits
Step 1: Normalization.
Step 2: Aggregation.
Theoretical Analysis of MaNo
How to Alleviate Overconfidence Issues of Logit-Based Methods?
...and 57 more sections

Figures (11)

Figure 1: Illustration of the LDS assumption. When the boundary passes through dense regions (a), margins have little predictive power and cannot be used without labels. On the contrary, margins are informative in sparse regions (b).
Figure 2: Empirical evidence with Resnet18.(a) The model is well-calibrated on Office-Home and miscalibrated on PACS. (b) softrun is superior to the state-of-the-art Nuclear deng2023confidence in all scenarios while the softmax heavily fails on PACS. (c) Increasing the approximation order $n$ in Eq. \ref{['eq:taylor']} is detrimental on PACS and beneficial on Office-Home. The optimal trade-off in all calibration scenarios is taking $n \in \{2, 3\}$.
Figure 3: OOD error prediction versus ground-truth error on Entity-13 with ResNet18. This scatter plot compares MaNo with Dispersion Score and ProjNorm. Each point represents one dataset under a specific type and severity of corruption. Different shapes indicate different types of corruption, while darker colors indicate higher severity levels. This indicates the qualitative superiority of MaNo.
Figure 4: $R^2$ distribution with ResNet18 on all distribution shifts. Overall, MaNo leads to the best and most robust estimations.
Figure 5: $R^2$ distribution using ResNets (average), ConvNext, or ViT on all distribution shifts. Again, MaNo is the best method.
...and 6 more figures

Theorems & Definitions (7)

proof
proof
proof
proof
proof
proof
proof

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

TL;DR

Abstract

MANO: Exploiting Matrix Norm for Unsupervised Accuracy Estimation Under Distribution Shifts

Authors

TL;DR

Abstract

Table of Contents

Figures (11)

Theorems & Definitions (7)