Table of Contents
Fetching ...

Energy-based Hopfield Boosting for Out-of-Distribution Detection

Claus Hofmann, Simon Schmid, Bernhard Lehner, Daniel Klotz, Sepp Hochreiter

TL;DR

Hopfield Boosting introduces an energy-based boosting framework that uses Modern Hopfield Energy (MHE) to learn a boundary around in-distribution data by sampling informative auxiliary outliers near the boundary. The method trains a classifier with an OOD loss that explicitly minimizes a boundary-focused energy $ ext{E}_b$, while adaptively reweighting AUX samples to emphasize hard cases. Empirical results on CIFAR-10/100 and ImageNet-1K show state-of-the-art improvements in FPR95 and AUROC compared to eight OE-based baselines, with ablations confirming the necessity of weighted sampling, the projection head, and the OOD loss. The approach scales to large datasets and maintains modest inference overhead, offering a principled, differentiable alternative to post-hoc OOD scoring and prior MHN-based methods. Overall, Hopfield Boosting sharpens the ID–OOD boundary by combining energy-based similarity with an adaptive, boundary-driven sampling strategy, yielding robust OOD detection in practical vision tasks.

Abstract

Out-of-distribution (OOD) detection is critical when deploying machine learning models in the real world. Outlier exposure methods, which incorporate auxiliary outlier data in the training process, can drastically improve OOD detection performance compared to approaches without advanced training strategies. We introduce Hopfield Boosting, a boosting approach, which leverages modern Hopfield energy (MHE) to sharpen the decision boundary between the in-distribution and OOD data. Hopfield Boosting encourages the model to concentrate on hard-to-distinguish auxiliary outlier examples that lie close to the decision boundary between in-distribution and auxiliary outlier data. Our method achieves a new state-of-the-art in OOD detection with outlier exposure, improving the FPR95 metric from 2.28 to 0.92 on CIFAR-10 and from 11.76 to 7.94 on CIFAR-100.

Energy-based Hopfield Boosting for Out-of-Distribution Detection

TL;DR

Hopfield Boosting introduces an energy-based boosting framework that uses Modern Hopfield Energy (MHE) to learn a boundary around in-distribution data by sampling informative auxiliary outliers near the boundary. The method trains a classifier with an OOD loss that explicitly minimizes a boundary-focused energy , while adaptively reweighting AUX samples to emphasize hard cases. Empirical results on CIFAR-10/100 and ImageNet-1K show state-of-the-art improvements in FPR95 and AUROC compared to eight OE-based baselines, with ablations confirming the necessity of weighted sampling, the projection head, and the OOD loss. The approach scales to large datasets and maintains modest inference overhead, offering a principled, differentiable alternative to post-hoc OOD scoring and prior MHN-based methods. Overall, Hopfield Boosting sharpens the ID–OOD boundary by combining energy-based similarity with an adaptive, boundary-driven sampling strategy, yielding robust OOD detection in practical vision tasks.

Abstract

Out-of-distribution (OOD) detection is critical when deploying machine learning models in the real world. Outlier exposure methods, which incorporate auxiliary outlier data in the training process, can drastically improve OOD detection performance compared to approaches without advanced training strategies. We introduce Hopfield Boosting, a boosting approach, which leverages modern Hopfield energy (MHE) to sharpen the decision boundary between the in-distribution and OOD data. Hopfield Boosting encourages the model to concentrate on hard-to-distinguish auxiliary outlier examples that lie close to the decision boundary between in-distribution and auxiliary outlier data. Our method achieves a new state-of-the-art in OOD detection with outlier exposure, improving the FPR95 metric from 2.28 to 0.92 on CIFAR-10 and from 11.76 to 7.94 on CIFAR-100.
Paper Structure (87 sections, 1 theorem, 115 equations, 15 figures, 10 tables, 1 algorithm)

This paper contains 87 sections, 1 theorem, 115 equations, 15 figures, 10 tables, 1 algorithm.

Key Result

Lemma J.1

(see Lemma E.1 in Ming:22) Assume the M sampled data points $\bm{o}_i \sim p_\text{AUX}$ satisfy the following constraint on high boundary scores $\mathrm{E}_b(\bm{\xi})$ Then they have

Figures (15)

  • Figure 1: The Hopfield Boosting concept. The first step (weight) creates weak learners by firstly choosing in-distribution samples (ID, orange), and by secondly choosing auxiliary outlier samples (AUX, blue) according to their assigned probabilities; the second step (evaluate) computes the losses for the resulting predictions (Section \ref{['seq:method']}); and the third step (update) assigns new probabilities to the AUX samples according to their position on the hypersphere (see Figure \ref{['fig:adaptive-resampling']}).
  • Figure 2: Synthetic example of the adaptive resampling mechanism. Hopfield Boosting forms a strong learner by sampling and combining a set of weak learners close to the decision boundary. The heatmap on the background shows $\exp(\beta \mathrm{E}_b(\bm{\xi}; \bm{X}, \bm{O}))$, where $\beta$ is $60$. Only the sampled (i.e., highlighted) points serve as memories $\bm{X}$ and $\bm{O}$.
  • Figure 3: Depiction of the energy function $\mathrm{E}_b(\bm{\xi}; \bm{X}, \bm{O})$ on a hypersphere. (a) shows $\mathrm{E}_b(\bm{\xi}, \bm{X}, \bm{O})$ with exemplary inlier (orange) and outlier (blue) points; and (b) shows $\exp(\beta \mathrm{E}_b(\bm{\xi}, \bm{X}, \bm{O}))$. $\beta$ was set to 128. Both, (a) and (b), rotate the sphere by 0, 90, 180, and 270 degrees around the vertical axis.
  • Figure 4: $\mathcal{L}_{\text{OOD}}$ applied to exemplary data points on euclidean space. Gradient updates are applied to the data points directly. We observe that the variance orthogonal to the decision boundary shrinks while the variance parallel to the decision boundary does not change to this extent. $\beta$ is set to 2.
  • Figure 5: $\mathcal{L}_{\text{OOD}}$ applied to exemplary data points on a sphere. Gradients are applied to the data points directly. We observe that the geometry of the space forces the patterns to opposing poles of the sphere.
  • ...and 10 more figures

Theorems & Definitions (2)

  • Lemma J.1
  • proof