Table of Contents
Fetching ...

Understanding normalization in contrastive representation learning and out-of-distribution detection

Tai Le-Gia, Jaehyun Ahn

TL;DR

This work analyzes the behavior of the $ull$-norm of contrastive features in representation learning and leverages this insight for out-of-distribution detection. It introduces Outlier Exposure Contrastive Learning (OECL), which augments a contrastive objective with an OE term $\mathcal{L}_{\text{oecl}} = \mathcal{L}_{\text{contrastive}}(z; \tau, M) + \alpha \mathbb{E}_{x \in \mathcal{D}_{\text{oe}}, t \in T_{\text{oe}}}\|f(t(x))\|_2$, enabling the integration of external or self-generated OOD data through outlier exposure transformations. Across unimodal and multimodal benchmarks (CIFAR-10, ImageNet-30, DIOR, Raabin-WBC, HAM10000), OECL delivers competitive or superior OOD detection performance, with self-OECL sometimes outperforming state-of-the-art methods like CSI on standard datasets. The study also demonstrates the diminishing effect of far-OE data as feature quality improves and reports practical gains in few-shot OOD scenarios, underscoring OECL’s robustness and its potential as a baseline for future OE-based anomaly detection research.

Abstract

Contrastive representation learning has emerged as an outstanding approach for anomaly detection. In this work, we explore the $\ell_2$-norm of contrastive features and its applications in out-of-distribution detection. We propose a simple method based on contrastive learning, which incorporates out-of-distribution data by discriminating against normal samples in the contrastive layer space. Our approach can be applied flexibly as an outlier exposure (OE) approach, where the out-of-distribution data is a huge collective of random images, or as a fully self-supervised learning approach, where the out-of-distribution data is self-generated by applying distribution-shifting transformations. The ability to incorporate additional out-of-distribution samples enables a feasible solution for datasets where AD methods based on contrastive learning generally underperform, such as aerial images or microscopy images. Furthermore, the high-quality features learned through contrastive learning consistently enhance performance in OE scenarios, even when the available out-of-distribution dataset is not diverse enough. Our extensive experiments demonstrate the superiority of our proposed method under various scenarios, including unimodal and multimodal settings, with various image datasets.

Understanding normalization in contrastive representation learning and out-of-distribution detection

TL;DR

This work analyzes the behavior of the -norm of contrastive features in representation learning and leverages this insight for out-of-distribution detection. It introduces Outlier Exposure Contrastive Learning (OECL), which augments a contrastive objective with an OE term , enabling the integration of external or self-generated OOD data through outlier exposure transformations. Across unimodal and multimodal benchmarks (CIFAR-10, ImageNet-30, DIOR, Raabin-WBC, HAM10000), OECL delivers competitive or superior OOD detection performance, with self-OECL sometimes outperforming state-of-the-art methods like CSI on standard datasets. The study also demonstrates the diminishing effect of far-OE data as feature quality improves and reports practical gains in few-shot OOD scenarios, underscoring OECL’s robustness and its potential as a baseline for future OE-based anomaly detection research.

Abstract

Contrastive representation learning has emerged as an outstanding approach for anomaly detection. In this work, we explore the -norm of contrastive features and its applications in out-of-distribution detection. We propose a simple method based on contrastive learning, which incorporates out-of-distribution data by discriminating against normal samples in the contrastive layer space. Our approach can be applied flexibly as an outlier exposure (OE) approach, where the out-of-distribution data is a huge collective of random images, or as a fully self-supervised learning approach, where the out-of-distribution data is self-generated by applying distribution-shifting transformations. The ability to incorporate additional out-of-distribution samples enables a feasible solution for datasets where AD methods based on contrastive learning generally underperform, such as aerial images or microscopy images. Furthermore, the high-quality features learned through contrastive learning consistently enhance performance in OE scenarios, even when the available out-of-distribution dataset is not diverse enough. Our extensive experiments demonstrate the superiority of our proposed method under various scenarios, including unimodal and multimodal settings, with various image datasets.
Paper Structure (29 sections, 1 theorem, 14 equations, 8 figures, 9 tables)

This paper contains 29 sections, 1 theorem, 14 equations, 8 figures, 9 tables.

Key Result

Lemma 1

For any $\mu>0$, $f(y)$ is a non-increasing and convex function on $\left[0, \infty\right)$ where

Figures (8)

  • Figure 1: The $\ell_2$-norm of contrastive features for both normal and OOD samples during optimization of $\mathcal{L}_{\text{contrastive}}$ using only normal samples. Following a few epochs of rapid increase, the $\ell_2$-norm for both normal and OOD samples gradually decreases. Notably, as training progresses, the $\ell_2$-norm of the training samples becomes larger than that of the OOD samples.
  • Figure 2: Averages of $\mu/\sigma_v$ and $\sigma_v$ for both normal and OOD samples during the optimization of $\mathcal{L_{\text{contrastive}}}$ using only normal samples. While there is only a minimal difference in $\sigma_v$ between normal and OOD samples, the ratio $\mu/\sigma_v$ for normal samples becomes significantly larger as the training progresses.
  • Figure 3: AUROC scores across varying training data sizes for "near" and "far" OE datasets. The training dataset consists of lymphocyte and monocyte images, with eosinophil images from the Raabin-WBC dataset employed for testing. A combination of basophil and neutrophil images serves as the "near" OE, while ImageNet images serve as the "far" OE. As the training size increases, the influence of the "far" OE on anomaly detection diminishes, while "near" OE continues to synergize with the training data.
  • Figure 4: The training loss and $\ell_2$-norm of ID and OE data when we train self-OECL on class monocyte of Raabin-WBC. When OE data and training data are overlap, directly minimizing $\ell_2$-norm of contrastive features makes the training becomes unstable and learns meaningless features (during the initial 50 epochs, we set $\alpha=0$, more information about training is presented in \ref{['appendix:warmup']}).
  • Figure 5: $\ell_2$-norm of contrastive features for both normal and OOD samples during optimization of $\mathcal{L}_{\text{contrastive}}$ without $\ell_2$-normalization, using only normal samples.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Lemma 1
  • proof