Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Dong Geun Shin; Hye Won Chung

Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Dong Geun Shin, Hye Won Chung

TL;DR

The paper tackles the challenge of detecting out-of-distribution (OOD) samples when training on long-tailed datasets. It introduces Representation Norm Amplification (RNA), which decouples embedding-based OOD detection from logit-based in-distribution (ID) classification by using the representation norm as an OOD cue and enlarging ID norms via an RNA loss, while updating BN statistics with auxiliary OOD data. RNA achieves improved OOD detection (e.g., lower FPR95) and higher classification accuracy on CIFAR10-LT and ImageNet-LT compared to state-of-the-art LT-OOD methods, with clear evidence of enhanced separation between ID and OOD representations. The method shows strong scalability to large datasets and robustness across imbalance ratios, though near-OOD detection remains challenging in some LT cases. Overall, RNA provides a practical, single-model solution that mitigates the trade-offs between OOD detection and LT classification and advances reliable deployment of models under imbalanced conditions.

Abstract

Detecting out-of-distribution (OOD) samples is a critical task for reliable machine learning. However, it becomes particularly challenging when the models are trained on long-tailed datasets, as the models often struggle to distinguish tail-class in-distribution samples from OOD samples. We examine the main challenges in this problem by identifying the trade-offs between OOD detection and in-distribution (ID) classification, faced by existing methods. We then introduce our method, called \textit{Representation Norm Amplification} (RNA), which solves this challenge by decoupling the two problems. The main idea is to use the norm of the representation as a new dimension for OOD detection, and to develop a training method that generates a noticeable discrepancy in the representation norm between ID and OOD data, while not perturbing the feature learning for ID classification. Our experiments show that RNA achieves superior performance in both OOD detection and classification compared to the state-of-the-art methods, by 1.70\% and 9.46\% in FPR95 and 2.43\% and 6.87\% in classification accuracy on CIFAR10-LT and ImageNet-LT, respectively. The code for this work is available at https://github.com/dgshin21/RNA.

Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

TL;DR

Abstract

Paper Structure (53 sections, 9 equations, 8 figures, 16 tables, 2 algorithms)

This paper contains 53 sections, 9 equations, 8 figures, 16 tables, 2 algorithms.

Introduction
Related works
OOD detection in long-tail learning
Norm-based OOD detection
Background
OOD detection
Long-tail learning
Motivation: trade-offs between OOD detection and long-tailed recognition
Representation Norm Amplification (RNA)
Previous ways of exposing auxiliary OOD data during training
OE perturbs tail classification: gradient analysis
Proposed OOD scoring and training method with auxiliary OOD data
Representation Norm (RN) score
Representation Norm Amplification (RNA)
Effect of auxiliary OOD data: regularizing the activation ratio
...and 38 more sections

Figures (8)

Figure 1: The OOD detection performance (FPR95) and classification accuracy (Accuracy) of models trained by various methods. OE hendrycks2019oe, designed for OOD detection, and LA menon2021longtail, aimed at long-tailed recognition (LTR), exhibit trade-offs between OOD detection and ID classification when combined (LA+OE). In contrast, our method (RNA) excels in both FPR95 and Accuracy, effectively overcoming these trade-offs.
Figure 2: (a) The gradient ratio of ID classification loss to OOD detection loss with respect to the classifier weight of LA+OE model trained on CIFAR10-LT, i.e. $\log(\|\nabla_{w_c} \mathcal{L_{\text{ID}}}\|_1/\|\nabla_{w_c} \mathcal{L_{\text{OOD}}}\|_1)$. In particular, $\mathcal{L}_\text{LA}$ and $\mathcal{L}_\text{OE}$ are used as $\mathcal{L}_\text{ID}$ and $\mathcal{L}_\text{OOD}$, respectively. Note that the log-ratio for tail classes is less than zero at the early stage of training, indicating that the gradient update is dominated by OOD data rather than ID data. (b) The activation ratio of ID and OOD representations at the last ReLU layer in the models trained by RNA, LA, and OE. CIFAR10 and SVHN are used as ID and OOD sets, respectively.
Figure 3: (a) During training, RNA uses both ID and auxiliary OOD data. The network parameters are updated to minimize the classification loss $\mathcal{L}_{\text{LA}}$ (Equation \ref{['eqn:LA_loss']}) of the ID samples, regularized by their representation norms through the RNA loss $\mathcal{L}_{\text{RNA}}$ (Equation \ref{['eqn:RNA_loss_reg']}). The OOD data only indirectly contribute to the updating of the model parameters, as being used to update the running statistics of the BN layers. (b) This illustration represents the latent values of ID (red) and OOD (blue) samples before the last BN layer of the RNA-trained model. The error bars denote the maximum and minimum values among the latent vectors averaged over ID and OOD data, respectively. Through the training, the running mean of the BN layer (black) converges between the ID and OOD values. After passing through the last BN and ReLU layers (before the classifier), the shaded region beneath the BN running mean is deactivated. RNA effectively generates a noticeable gap in the activation ratio at the last ReLU layer and the representation norm between ID vs. OOD data, which serves as the basis for our OOD score.
Figure 4: (a) The histogram of representation norms of ID/OOD test data on LA-trained models on CIFAR10-LT. The red bars represent the density of representation norms of ID data and the blue bars represent that of OOD data (SVHN). (b) The histogram of representation norms of ID/OOD test data on RNA-trained models on CIFAR10-LT. The evident gap in the distributions of representation norms of ID and OOD data enables effective OOD detection using representation norms.
Figure 5: AUC and ACC (%) performance of RNA-trained model with varying the balancing hyperparameter $\lambda$. Blue (x-marked) lines indicate the AUC values and red (circle-marked) lines indicate the ACC values.
...and 3 more figures

Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

TL;DR

Abstract

Representation Norm Amplification for Out-of-Distribution Detection in Long-Tail Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)