Table of Contents
Fetching ...

Taylor Outlier Exposure

Kohei Fukuda, Hiroaki Aizawa

TL;DR

This work tackles OOD detection when the auxiliary OOD data are contaminated by ID samples. It introduces Taylor Outlier Exposure (TaylorOE), a polynomial regularization $\mathcal{L}_{toe}$ derived from a finite-order Taylor expansion of the standard OE term $\mathcal{L}_{oe}$, with the order $t$ controlling the regularization strength. Empirically, TaylorOE consistently improves OOD detection over conventional OE across CIFAR-10/100 with noisy OOD data, and it remains effective when integrated with advanced OE methods such as Resampling and DivOE. The approach reduces reliance on perfectly clean OOD data, enabling scalable training on mixed data and offering practical benefits for robust OOD generalization, though hyperparameter tuning remains important depending on the contamination level and dataset characteristics.

Abstract

Out-of-distribution (OOD) detection is the task of identifying data sampled from distributions that were not used during training. This task is essential for reliable machine learning and a better understanding of their generalization capabilities. Among OOD detection methods, Outlier Exposure (OE) significantly enhances OOD detection performance and generalization ability by exposing auxiliary OOD data to the model. However, constructing clean auxiliary OOD datasets, uncontaminated by in-distribution (ID) samples, is essential for OE; generally, a noisy OOD dataset contaminated with ID samples negatively impacts OE training dynamics and final detection performance. Furthermore, as dataset scale increases, constructing clean OOD data becomes increasingly challenging and costly. To address these challenges, we propose Taylor Outlier Exposure (TaylorOE), an OE-based approach with regularization that allows training on noisy OOD datasets contaminated with ID samples. Specifically, we represent the OE regularization term as a polynomial function via a Taylor expansion, allowing us to control the regularization strength for ID data in the auxiliary OOD dataset by adjusting the order of Taylor expansion. In our experiments on the OOD detection task with clean and noisy OOD datasets, we demonstrate that the proposed method consistently outperforms conventional methods and analyze our regularization term to show its effectiveness. Our implementation code of TaylorOE is available at \url{https://github.com/fukuchan41/TaylorOE}.

Taylor Outlier Exposure

TL;DR

This work tackles OOD detection when the auxiliary OOD data are contaminated by ID samples. It introduces Taylor Outlier Exposure (TaylorOE), a polynomial regularization derived from a finite-order Taylor expansion of the standard OE term , with the order controlling the regularization strength. Empirically, TaylorOE consistently improves OOD detection over conventional OE across CIFAR-10/100 with noisy OOD data, and it remains effective when integrated with advanced OE methods such as Resampling and DivOE. The approach reduces reliance on perfectly clean OOD data, enabling scalable training on mixed data and offering practical benefits for robust OOD generalization, though hyperparameter tuning remains important depending on the contamination level and dataset characteristics.

Abstract

Out-of-distribution (OOD) detection is the task of identifying data sampled from distributions that were not used during training. This task is essential for reliable machine learning and a better understanding of their generalization capabilities. Among OOD detection methods, Outlier Exposure (OE) significantly enhances OOD detection performance and generalization ability by exposing auxiliary OOD data to the model. However, constructing clean auxiliary OOD datasets, uncontaminated by in-distribution (ID) samples, is essential for OE; generally, a noisy OOD dataset contaminated with ID samples negatively impacts OE training dynamics and final detection performance. Furthermore, as dataset scale increases, constructing clean OOD data becomes increasingly challenging and costly. To address these challenges, we propose Taylor Outlier Exposure (TaylorOE), an OE-based approach with regularization that allows training on noisy OOD datasets contaminated with ID samples. Specifically, we represent the OE regularization term as a polynomial function via a Taylor expansion, allowing us to control the regularization strength for ID data in the auxiliary OOD dataset by adjusting the order of Taylor expansion. In our experiments on the OOD detection task with clean and noisy OOD datasets, we demonstrate that the proposed method consistently outperforms conventional methods and analyze our regularization term to show its effectiveness. Our implementation code of TaylorOE is available at \url{https://github.com/fukuchan41/TaylorOE}.

Paper Structure

This paper contains 17 sections, 5 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Overview of problem setting and our idea. We consider an auxiliary OOD dataset for OE that includes not only OOD data (blue circles) but also ID data (red crosses). Our TaylorOE $\mathcal{L}_{\text{toe}}$, a polynomial loss function derived from Taylor expansion, suppresses the influence of the ID data in the noisy OOD dataset by adjusting the regularization strength based on the order $t$.
  • Figure 2: Comparison of TaylorOE with OE outlier_exposure and WOODS woods. The noise ratio $\pi$ represents the proportion of clean OOD data in the auxiliary OOD dataset. TaylorOE outperforms these conventional methods across various noise ratios, demonstrating its ability to work effectively even with noisy data.
  • Figure 3: Histograms of maximum softmax probabilities for ID and OOD data (left) and Eq. \ref{['l_oe']} (right) obtained from the pretrained model on ID data. For ID data, the maximum softmax probabilities of most samples are close to 1. As a result, the probabilities of other classes are close to 0, causing the entropy to rise and leading to larger values for Eq. \ref{['l_oe']}. On the other hand, clean OOD data has fewer samples with maximum softmax probabilities close to 1 compared to ID data, and probabilities are distributed across multiple classes to some extent, causing the entropy to fall and leading to smaller values for Eq. \ref{['l_oe']}.
  • Figure 4: Softmax probabilities for ID and OOD samples (left), $-\log p$ for each class (center), and Taylor series expansion of $-\log p$ used in the proposed method (right). For ID samples, the predicted probability is concentrated on a single class, resulting in larger values for $-\log p$ of other classes. By performing a Taylor expansion up to the finite order, the proposed method prevents the information from classes with small predicted probabilities from becoming excessively large. For OOD samples, the information from classes such as the fifth and sixth classes, which do not have excessively large information, is preserved.
  • Figure 5: Taylor expansion of $-\log p$ around $p=1$. The polynomial function around $p = 1$ closely matches the original function $-\log p$ near $p = 1$ and lies below the original function near $p = 0$. Additionally, as the finite order of the expansion increases, the value near $p = 0$ gradually approaches the original function.
  • ...and 4 more figures