Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective
Kun Fang, Qinghua Tao, Xiaolin Huang, Jie Yang
TL;DR
This work reframes OoD detection through a loss-landscape lens, revealing that independently trained InD-optimal modes share low in-distribution loss yet exhibit diverse out-of-distribution loss landscapes, which leads to high cross-mode variance in OoD performance. By revisiting deep ensemble methods, the authors propose mode ensemble strategies that aggregate logits or features across multiple modes, yielding more stable and improved OoD detection across a range of detectors and network architectures, including Vision Transformers. A theoretical result shows that mode ensembles reduce the probit-transformed accuracy gap between InD and OoD data compared to averaging single modes, under a Gaussian data model. Empirically, high variances across independent modes are demonstrated on CIFAR-10 and ImageNet-1K, and mode ensembles consistently reduce variance and boost detection metrics like FPR and AUROC, with detector- and architecture-dependent gains. The work highlights the value of considering independent modes for reliable OoD evaluation and sets the stage for more robust, ensemble-based OoD detectors in practical deployment.
Abstract
Existing Out-of-Distribution (OoD) detection methods address to detect OoD samples from In-Distribution (InD) data mainly by exploring differences in features, logits and gradients in Deep Neural Networks (DNNs). We in this work propose a new perspective upon loss landscape and mode ensemble to investigate OoD detection. In the optimization of DNNs, there exist many local optima in the parameter space, or namely modes. Interestingly, we observe that these independent modes, which all reach low-loss regions with InD data (training and test data), yet yield significantly different loss landscapes with OoD data. Such an observation provides a novel view to investigate the OoD detection from the loss landscape, and further suggests significantly fluctuating OoD detection performance across these modes. For instance, FPR values of the RankFeat method can range from 46.58% to 84.70% among 5 modes, showing uncertain detection performance evaluations across independent modes. Motivated by such diversities on OoD loss landscape across modes, we revisit the deep ensemble method for OoD detection through mode ensemble, leading to improved performance and benefiting the OoD detector with reduced variances. Extensive experiments covering varied OoD detectors and network structures illustrate high variances across modes and validate the superiority of mode ensemble in boosting OoD detection. We hope this work could attract attention in the view of independent modes in the loss landscape of OoD data and more reliable evaluations on OoD detectors.
