Understanding normalization in contrastive representation learning and out-of-distribution detection
Tai Le-Gia, Jaehyun Ahn
TL;DR
This work analyzes the behavior of the $ull$-norm of contrastive features in representation learning and leverages this insight for out-of-distribution detection. It introduces Outlier Exposure Contrastive Learning (OECL), which augments a contrastive objective with an OE term $\mathcal{L}_{\text{oecl}} = \mathcal{L}_{\text{contrastive}}(z; \tau, M) + \alpha \mathbb{E}_{x \in \mathcal{D}_{\text{oe}}, t \in T_{\text{oe}}}\|f(t(x))\|_2$, enabling the integration of external or self-generated OOD data through outlier exposure transformations. Across unimodal and multimodal benchmarks (CIFAR-10, ImageNet-30, DIOR, Raabin-WBC, HAM10000), OECL delivers competitive or superior OOD detection performance, with self-OECL sometimes outperforming state-of-the-art methods like CSI on standard datasets. The study also demonstrates the diminishing effect of far-OE data as feature quality improves and reports practical gains in few-shot OOD scenarios, underscoring OECL’s robustness and its potential as a baseline for future OE-based anomaly detection research.
Abstract
Contrastive representation learning has emerged as an outstanding approach for anomaly detection. In this work, we explore the $\ell_2$-norm of contrastive features and its applications in out-of-distribution detection. We propose a simple method based on contrastive learning, which incorporates out-of-distribution data by discriminating against normal samples in the contrastive layer space. Our approach can be applied flexibly as an outlier exposure (OE) approach, where the out-of-distribution data is a huge collective of random images, or as a fully self-supervised learning approach, where the out-of-distribution data is self-generated by applying distribution-shifting transformations. The ability to incorporate additional out-of-distribution samples enables a feasible solution for datasets where AD methods based on contrastive learning generally underperform, such as aerial images or microscopy images. Furthermore, the high-quality features learned through contrastive learning consistently enhance performance in OE scenarios, even when the available out-of-distribution dataset is not diverse enough. Our extensive experiments demonstrate the superiority of our proposed method under various scenarios, including unimodal and multimodal settings, with various image datasets.
