Scalable Ensemble Diversification for OOD Generalization and Detection

Alexander Rubinstein; Luca Scimeca; Damien Teney; Seong Joon Oh

Scalable Ensemble Diversification for OOD Generalization and Detection

Alexander Rubinstein, Luca Scimeca, Damien Teney, Seong Joon Oh

TL;DR

This work presents a method for Scalable Ensemble Diversification applicable to large-scale settings (e.g. ImageNet) that does not require OOD samples and turns the diversity of ensemble hypotheses into a novel uncertainty score estimator that surpasses a large number of OOD detection baselines.

Abstract

Training a diverse ensemble of models has several practical applications such as providing candidates for model selection with better out-of-distribution (OOD) generalization, and enabling the detection of OOD samples via Bayesian principles. An existing approach to diverse ensemble training encourages the models to disagree on provided OOD samples. However, the approach is computationally expensive and it requires well-separated ID and OOD examples, such that it has only been demonstrated in small-scale settings. $\textbf{Method.}$ This work presents a method for Scalable Ensemble Diversification (SED) applicable to large-scale settings (e.g. ImageNet) that does not require OOD samples. Instead, SED identifies hard training samples on the fly and encourages the ensemble members to disagree on these. To improve scaling, we show how to avoid the expensive computations in existing methods of exhaustive pairwise disagreements across models. $\textbf{Results.}$ We evaluate the benefits of diversification with experiments on ImageNet. First, for OOD generalization, we observe large benefits from the diversification in multiple settings including output-space (classical) ensembles and weight-space ensembles (model soups). Second, for OOD detection, we turn the diversity of ensemble hypotheses into a novel uncertainty score estimator that surpasses a large number of OOD detection baselines. Code is available here: https://github.com/AlexanderRubinstein/diverse-universe-public.

Scalable Ensemble Diversification for OOD Generalization and Detection

TL;DR

Abstract

This work presents a method for Scalable Ensemble Diversification (SED) applicable to large-scale settings (e.g. ImageNet) that does not require OOD samples. Instead, SED identifies hard training samples on the fly and encourages the ensemble members to disagree on these. To improve scaling, we show how to avoid the expensive computations in existing methods of exhaustive pairwise disagreements across models.

We evaluate the benefits of diversification with experiments on ImageNet. First, for OOD generalization, we observe large benefits from the diversification in multiple settings including output-space (classical) ensembles and weight-space ensembles (model soups). Second, for OOD detection, we turn the diversity of ensemble hypotheses into a novel uncertainty score estimator that surpasses a large number of OOD detection baselines. Code is available here: https://github.com/AlexanderRubinstein/diverse-universe-public.

Paper Structure (21 sections, 8 equations, 5 figures, 10 tables)

This paper contains 21 sections, 8 equations, 5 figures, 10 tables.

Introduction
Diverse Ensembles through Prediction Disagreement
Proposed Method
Dynamic Selection of Hard Examples
Tricks to Improve Scalability
Predictive Diversity Score (PDS) for OOD Detection
Experiments
Experimental Setup
Diversification
OOD Generalization
OOD Detection
Related Work
Conclusions
Appendices
Varying the Number of Trainable Layers
...and 6 more sections

Figures (5)

Figure 1: Existing diversification methods (top) require distinct (unlabeled) OOD examples on which the models are encouraged to disagree. Our Scalable Ensemble Diversification (SED, bottom) instead encourages the models to diverge on hard examples identified within a single standard training set.
Figure 2: ImageNet-R examples leading to the greatest and least disagreement. We show the 5 most divergent and 5 least divergent samples according to the SED ensemble. We measure prediction diversity with the Prediction Diversity Score (PDS) in §\ref{['sec:epistemic_unc']}. GT refers to the ground truth category. Ensemble predictions are shown in bold; in cases where ensemble members predict classes different from the ensemble prediction we provide them on the next line with standard font.
Figure 3: Impact of diversity regulariser on OOD detection. We show the model answer diversity, measured by PDS, and the OOD detection performance, measured by AUROC, against $\lambda$ values, the loss weight for the disagreement regularizer term.
Figure 4: Impact of ensemble size on OOD detection.
Figure :

Scalable Ensemble Diversification for OOD Generalization and Detection

TL;DR

Abstract

Scalable Ensemble Diversification for OOD Generalization and Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (5)