Exploring Simple, High Quality Out-of-Distribution Detection with L2 Normalization
Jarrod Haas, William Yolland, Bernhard Rabus
TL;DR
This work tackles unreliable confidence in deep classifiers for out-of-distribution inputs by proposing a remarkably simple baseline: apply L2 normalization to encoder features during training. The method decouples feature magnitude from direction, allowing norms to carry rich information about input familiarity without additional losses or tuning, and can be implemented with two lines of PyTorch code. Empirically, it yields competitive OoD detection on several architectures and ID datasets, often with faster training than state-of-the-art methods. The authors connect this behavior to Neural Collapse theory and coherent gradient dynamics, offering a theoretical lens for why feature norms encode useful image information and signaling a promising direction for efficient OoD detection research.
Abstract
We demonstrate that L2 normalization over feature space can produce capable performance for Out-of-Distribution (OoD) detection for some models and datasets. Although it does not demonstrate outright state-of-the-art performance, this method is notable for its extreme simplicity: it requires only two addition lines of code, and does not need specialized loss functions, image augmentations, outlier exposure or extra parameter tuning. We also observe that training may be more efficient for some datasets and architectures. Notably, only 60 epochs with ResNet18 on CIFAR10 (or 100 epochs with ResNet50) can produce performance within two percentage points (AUROC) of several state-of-the-art methods for some near and far OoD datasets. We provide theoretical and empirical support for this method, and demonstrate viability across five architectures and three In-Distribution (ID) datasets.
