ALTBI: Constructing Improved Outlier Detection Models via Optimization of Inlier-Memorization Effect

Seoyoung Cho; Jaesung Hwang; Kwan-Young Bak; Dongha Kim

ALTBI: Constructing Improved Outlier Detection Models via Optimization of Inlier-Memorization Effect

Seoyoung Cho, Jaesung Hwang, Kwan-Young Bak, Dongha Kim

TL;DR

ALTBI addresses unsupervised outlier detection by maximizing the inlier-memorization (IM) effect in deep generative models. It combines a gradually increasing mini-batch size, an adaptive loss truncation threshold, and an ensemble of losses across updates to emphasize inliers early in training while suppressing outlier influence. Empirical results on 57 datasets show state-of-the-art outlier detection performance with lower compute cost, and the approach exhibits robustness under differential privacy constraints. Theoretical analysis guarantees a decreasing outlier inclusion in the truncated loss and convergence of inlier risk under standard optimization assumptions, underscoring ALTBI’s practical reliability for UOD tasks.

Abstract

Outlier detection (OD) is the task of identifying unusual observations (or outliers) from a given or upcoming data by learning unique patterns of normal observations (or inliers). Recently, a study introduced a powerful unsupervised OD (UOD) solver based on a new observation of deep generative models, called inlier-memorization (IM) effect, which suggests that generative models memorize inliers before outliers in early learning stages. In this study, we aim to develop a theoretically principled method to address UOD tasks by maximally utilizing the IM effect. We begin by observing that the IM effect is observed more clearly when the given training data contain fewer outliers. This finding indicates a potential for enhancing the IM effect in UOD regimes if we can effectively exclude outliers from mini-batches when designing the loss function. To this end, we introduce two main techniques: 1) increasing the mini-batch size as the model training proceeds and 2) using an adaptive threshold to calculate the truncated loss function. We theoretically show that these two techniques effectively filter out outliers from the truncated loss function, allowing us to utilize the IM effect to the fullest. Coupled with an additional ensemble strategy, we propose our method and term it Adaptive Loss Truncation with Batch Increment (ALTBI). We provide extensive experimental results to demonstrate that ALTBI achieves state-of-the-art performance in identifying outliers compared to other recent methods, even with significantly lower computation costs. Additionally, we show that our method yields robust performances when combined with privacy-preserving algorithms.

ALTBI: Constructing Improved Outlier Detection Models via Optimization of Inlier-Memorization Effect

TL;DR

Abstract

Paper Structure (33 sections, 5 theorems, 44 equations, 12 figures, 7 tables, 1 algorithm)

This paper contains 33 sections, 5 theorems, 44 equations, 12 figures, 7 tables, 1 algorithm.

Introduction
Outlier detection:
Improvement of IM effect:
Related works
Detailed description of ALTBI
Preliminaries
Notations and definitions
Brief review of ODIM
Relationship between IM effect and outlier ratio
Proposed method
Mini-batch increment and adaptive threshold
Ensemble within a single model
Choice of DGM framework
Theoretical analysis
Experiments
...and 18 more sections

Key Result

Proposition 4.1

At the $t$-th update, we suppose that the current parameter $\theta_{t-1}$ satisfies $a_1\le L_i(\theta_{t-1})\le a_2\gamma^{-(t-1)}$. For a mini-batch $\mathcal{D}_t$, we denote the inlier set which is included in the truncated loss as $\mathcal{A}_t^{\tau}$. Similarly, we can define $\mathcal{B}_t with a probability at least $1-\delta$.

Figures (12)

Figure 1: An illustration of ALTBI.
Figure 2: Relationship between the outlier ratio in training data and IM effect.
Figure 3: Outlier detection AUC values for DGMs with and without applying mini-batch increment and adaptive threshold, coloured as green and orange, respectively. (Upper left to clockwise) We analyze Ionosphere, Letter, Vowels, and MagicGamma datasets.
Figure 4: Trace plot of outlier ratio in truncated samples over various iterations. We visualize two datasets: (Left) Cardio and (Right) Shuttle.
Figure 5: Averaged AUC results, including means and standard deviations, across 57 datasets from ADBench over three different implementations. We mark an asterisk (*) next to methods for our own implementations. Color scheme: red (IM-based), orange (diffusion-based), green (deep-learning-based), blue (machine-learning-based).
...and 7 more figures

Theorems & Definitions (6)

Remark 3.1
Proposition 4.1
Proposition 4.2
Lemma 1.1
Lemma 1.2: Conditional version of Theorem 3.6 in chung2006concentration
Lemma 1.3: Conditional version of Lemma 4 in ghadimi2016mini

ALTBI: Constructing Improved Outlier Detection Models via Optimization of Inlier-Memorization Effect

TL;DR

Abstract

ALTBI: Constructing Improved Outlier Detection Models via Optimization of Inlier-Memorization Effect

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (12)

Theorems & Definitions (6)