Table of Contents
Fetching ...

Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection

Hengzhuang Li, Teng Zhang

TL;DR

This work tackles the challenge of robust OOD detection without relying on natural outliers by proposing HamOS, a Hamiltonian Monte Carlo–driven framework that synthesizes diverse virtual outliers directly from ID data. By projecting embeddings onto a unit hypersphere and sampling via Markov chains between nearby ID clusters, HamOS generates outliers with varying OOD-ness guided by a kNN-based density, while a dual-head architecture and a carefully designed loss function promote strong ID–OOD separation. The approach yields state-of-the-art performance on standard benchmarks (CIFAR-10/100) and scales to large datasets (ImageNet-1K), all while maintaining competitive ID accuracy and demonstrating robustness to various sampling and scoring choices. Overall, HamOS offers a flexible, efficient, and broadly applicable strategy for improving OOD detection through explicit, diverse outlier synthesis grounded in ID priors.

Abstract

Out-of-distribution (OOD) detection is crucial for developing trustworthy and reliable machine learning systems. Recent advances in training with auxiliary OOD data demonstrate efficacy in enhancing detection capabilities. Nonetheless, these methods heavily rely on acquiring a large pool of high-quality natural outliers. Some prior methods try to alleviate this problem by synthesizing virtual outliers but suffer from either poor quality or high cost due to the monotonous sampling strategy and the heavy-parameterized generative models. In this paper, we overcome all these problems by proposing the Hamiltonian Monte Carlo Outlier Synthesis (HamOS) framework, which views the synthesis process as sampling from Markov chains. Based solely on the in-distribution data, the Markov chains can extensively traverse the feature space and generate diverse and representative outliers, hence exposing the model to miscellaneous potential OOD scenarios. The Hamiltonian Monte Carlo with sampling acceptance rate almost close to 1 also makes our framework enjoy great efficiency. By empirically competing with SOTA baselines on both standard and large-scale benchmarks, we verify the efficacy and efficiency of our proposed HamOS.

Outlier Synthesis via Hamiltonian Monte Carlo for Out-of-Distribution Detection

TL;DR

This work tackles the challenge of robust OOD detection without relying on natural outliers by proposing HamOS, a Hamiltonian Monte Carlo–driven framework that synthesizes diverse virtual outliers directly from ID data. By projecting embeddings onto a unit hypersphere and sampling via Markov chains between nearby ID clusters, HamOS generates outliers with varying OOD-ness guided by a kNN-based density, while a dual-head architecture and a carefully designed loss function promote strong ID–OOD separation. The approach yields state-of-the-art performance on standard benchmarks (CIFAR-10/100) and scales to large datasets (ImageNet-1K), all while maintaining competitive ID accuracy and demonstrating robustness to various sampling and scoring choices. Overall, HamOS offers a flexible, efficient, and broadly applicable strategy for improving OOD detection through explicit, diverse outlier synthesis grounded in ID priors.

Abstract

Out-of-distribution (OOD) detection is crucial for developing trustworthy and reliable machine learning systems. Recent advances in training with auxiliary OOD data demonstrate efficacy in enhancing detection capabilities. Nonetheless, these methods heavily rely on acquiring a large pool of high-quality natural outliers. Some prior methods try to alleviate this problem by synthesizing virtual outliers but suffer from either poor quality or high cost due to the monotonous sampling strategy and the heavy-parameterized generative models. In this paper, we overcome all these problems by proposing the Hamiltonian Monte Carlo Outlier Synthesis (HamOS) framework, which views the synthesis process as sampling from Markov chains. Based solely on the in-distribution data, the Markov chains can extensively traverse the feature space and generate diverse and representative outliers, hence exposing the model to miscellaneous potential OOD scenarios. The Hamiltonian Monte Carlo with sampling acceptance rate almost close to 1 also makes our framework enjoy great efficiency. By empirically competing with SOTA baselines on both standard and large-scale benchmarks, we verify the efficacy and efficiency of our proposed HamOS.

Paper Structure

This paper contains 35 sections, 26 equations, 13 figures, 24 tables, 2 algorithms.

Figures (13)

  • Figure 1: OOD detection performance on CIFAR and ImageNet. The size of the dots indicates AUROC w.r.t. ImageNet-1K.
  • Figure 2: OOD scores of virtual outliers synthesized with different methods.
  • Figure 3: Depiction of the HamOS training framework. We design a dual-head framework utilizing a backbone for feature extraction. (1) The FC head preserves the initial ID classification efficacy; (2) The projection head transforms the feature embedding into a reduced-dimensional hyperspherical space, where we explicitly generate outliers through Hamiltonian Monte Carlo utilizing our innovative OOD-ness estimation. The spherical space is shaped by ID contrastive loss and OOD discernment loss to enhance differentiation between ID and OOD data.
  • Figure 4: HamOS synthesizes outliers of varying levels of OOD-ness: (a) OOD-ness density with illustrated synthesis process; (b) the OOD distribution of the virtual outliers at each synthesis round with different step sizes; (c) the ID probabilities and OOD scores of the synthesized outliers.
  • Figure 5: OOD performance is improved continuously with synthesized outliers.
  • ...and 8 more figures