Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective

Kun Fang; Qinghua Tao; Xiaolin Huang; Jie Yang

Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective

Kun Fang, Qinghua Tao, Xiaolin Huang, Jie Yang

TL;DR

This work reframes OoD detection through a loss-landscape lens, revealing that independently trained InD-optimal modes share low in-distribution loss yet exhibit diverse out-of-distribution loss landscapes, which leads to high cross-mode variance in OoD performance. By revisiting deep ensemble methods, the authors propose mode ensemble strategies that aggregate logits or features across multiple modes, yielding more stable and improved OoD detection across a range of detectors and network architectures, including Vision Transformers. A theoretical result shows that mode ensembles reduce the probit-transformed accuracy gap between InD and OoD data compared to averaging single modes, under a Gaussian data model. Empirically, high variances across independent modes are demonstrated on CIFAR-10 and ImageNet-1K, and mode ensembles consistently reduce variance and boost detection metrics like FPR and AUROC, with detector- and architecture-dependent gains. The work highlights the value of considering independent modes for reliable OoD evaluation and sets the stage for more robust, ensemble-based OoD detectors in practical deployment.

Abstract

Existing Out-of-Distribution (OoD) detection methods address to detect OoD samples from In-Distribution (InD) data mainly by exploring differences in features, logits and gradients in Deep Neural Networks (DNNs). We in this work propose a new perspective upon loss landscape and mode ensemble to investigate OoD detection. In the optimization of DNNs, there exist many local optima in the parameter space, or namely modes. Interestingly, we observe that these independent modes, which all reach low-loss regions with InD data (training and test data), yet yield significantly different loss landscapes with OoD data. Such an observation provides a novel view to investigate the OoD detection from the loss landscape, and further suggests significantly fluctuating OoD detection performance across these modes. For instance, FPR values of the RankFeat method can range from 46.58% to 84.70% among 5 modes, showing uncertain detection performance evaluations across independent modes. Motivated by such diversities on OoD loss landscape across modes, we revisit the deep ensemble method for OoD detection through mode ensemble, leading to improved performance and benefiting the OoD detector with reduced variances. Extensive experiments covering varied OoD detectors and network structures illustrate high variances across modes and validate the superiority of mode ensemble in boosting OoD detection. We hope this work could attract attention in the view of independent modes in the loss landscape of OoD data and more reliable evaluations on OoD detectors.

Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective

TL;DR

Abstract

Paper Structure (45 sections, 2 theorems, 24 equations, 6 figures, 20 tables)

This paper contains 45 sections, 2 theorems, 24 equations, 6 figures, 20 tables.

Introduction
OoD detection with ensemble modes
Preliminary: OoD detection
Mode ensemble for OoD detection
Loss landscape
Mode ensemble
Discussions with other ensemble methods for OoD detection vyas2018outyang2021ensemblexue2022boosting
Theoretical analysis
Experiments
Setups
High variances among the independent modes for OoD detection
Mode ensemble stabilizes and boosts OoD detection
Results on ViT models
In-depth empirical discussions on mode ensemble
On the independence of ensembling modes
...and 30 more sections

Key Result

Proposition 1

Consider the in distribution ${\cal P}_{\rm in}$, out distribution ${\cal P}_{\rm out}$, and $N$ independent modes $g_i, i=1,\cdots,N$ defined above, we have $\mathcal{G}(g_{\rm ens},\mathcal{P}_{\rm in},\mathcal{P}_{\rm out})\leq\frac{1}{N}\sum_{i=1}^N\mathcal{G}(g_i,\mathcal{P}_{\rm in},\mathcal{P denotes the gap of the probit-transformed accuracy between $\mathcal{P}_{\rm in}$ and $\mathcal{P}_

Figures (6)

Figure 1: An illustration on the loss landscape of InD and OoD data, as a function of network weights in a two-dimensional subspace. The 3 isolated modes are located in similar low loss regions on the InD data, but in significantly different loss areas for the OoD data. The visualization technique follows garipov2018loss. Refer to Fig.\ref{['fig:method-loss-landscape']} for more details on each mode.
Figure 2: There exists a high variance among the OoD detection results (FPR95) of 3 independent modes on 2 OoD detectors Energy liu2020energy and RankFeat song2022rankfeat (the right panel), while the 3 modes all hold good and similar recognition accuracy on the InD test set (the left panel).
Figure 3: An illustration on the feature trajectories during model training on CIFAR10 (InD) w.r.t 3 random seeds. 48 checkpoints in each training are sampled. The learned features of CIFAR10 (left), LSUN (OoD, middle) and SVHN (OoD, right) by these checkpoints are reduced to 2-dimension via t-SNE van2008visualizing, showing clearly the resulting 3 isolated modes.
Figure 4: An illustration of the loss landscapes of 3 independent modes (left, middle and right, respectively) on the InD (top) and OoD (bottom) data. The visualization technique follows li2018visualizing.
Figure 5: Ablation studies on the independence of modes. Each data point indicates the results of one single mode (circle dots) or ensembled modes (plus markers) with InD accuracy (y-axis) and average detection FPR over multiple OoD data sets (x-axis) achieved by Energy liu2020energy.
...and 1 more figures

Theorems & Definitions (3)

Proposition 1
Proposition \ref{prop:gap}$'$
proof

Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective

TL;DR

Abstract

Revisiting Deep Ensemble for Out-of-Distribution Detection: A Loss Landscape Perspective

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (3)