Table of Contents
Fetching ...

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

Xueying Ding, Yue Zhao, Leman Akoglu

TL;DR

The paper tackles unsupervised deep outlier model selection (UDOMS) by addressing two core challenges: lack of labeled validation data and the combinatorial HP/model space. It introduces HYPER, a framework that uses a hypernetwork to generate DOD weights conditioned on HPs, including architectural choices, and a meta-learned proxy validator $f_{val}$ trained offline on historical datasets to predict detection performance without labels. Online, HYPER alternates between refining HPs around a local neighborhood and updating the hypernetwork to produce best-response weights, guided by $f_{val}$ and a local sampling objective that balances exploration via an entropy term. Extensive experiments on 35 OD tasks show that HYPER achieves strong detection performance with substantial offline and online speed-ups compared to baselines, including state-of-the-art meta-learning approaches, and remains effective across tabular and image data when historical data distributions are similar. The approach offers a practical, scalable path to deploy deep OD models with carefully tuned HPs in real-world unsupervised settings, with open-source code for reproducibility.

Abstract

Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.

Fast Unsupervised Deep Outlier Model Selection with Hypernetworks

TL;DR

The paper tackles unsupervised deep outlier model selection (UDOMS) by addressing two core challenges: lack of labeled validation data and the combinatorial HP/model space. It introduces HYPER, a framework that uses a hypernetwork to generate DOD weights conditioned on HPs, including architectural choices, and a meta-learned proxy validator trained offline on historical datasets to predict detection performance without labels. Online, HYPER alternates between refining HPs around a local neighborhood and updating the hypernetwork to produce best-response weights, guided by and a local sampling objective that balances exploration via an entropy term. Extensive experiments on 35 OD tasks show that HYPER achieves strong detection performance with substantial offline and online speed-ups compared to baselines, including state-of-the-art meta-learning approaches, and remains effective across tabular and image data when historical data distributions are similar. The approach offers a practical, scalable path to deploy deep OD models with carefully tuned HPs in real-world unsupervised settings, with open-source code for reproducibility.

Abstract

Outlier detection (OD) finds many applications with a rich literature of numerous techniques. Deep neural network based OD (DOD) has seen a recent surge of attention thanks to the many advances in deep learning. In this paper, we consider a critical-yet-understudied challenge with unsupervised DOD, that is, effective hyperparameter (HP) tuning/model selection. While several prior work report the sensitivity of OD models to HPs, it becomes ever so critical for the modern DOD models that exhibit a long list of HPs. We introduce HYPER for tuning DOD models, tackling two fundamental challenges: (1) validation without supervision (due to lack of labeled anomalies), and (2) efficient search of the HP/model space (due to exponential growth in the number of HPs). A key idea is to design and train a novel hypernetwork (HN) that maps HPs onto optimal weights of the main DOD model. In turn, HYPER capitalizes on a single HN that can dynamically generate weights for many DOD models (corresponding to varying HPs), which offers significant speed-up. In addition, it employs meta-learning on historical OD tasks with labels to train a proxy validation function, likewise trained with our proposed HN efficiently. Extensive experiments on 35 OD tasks show that HYPER achieves high performance against 8 baselines with significant efficiency gains.
Paper Structure (18 sections, 11 equations, 9 figures, 7 tables, 1 algorithm)

This paper contains 18 sections, 11 equations, 9 figures, 7 tables, 1 algorithm.

Figures (9)

  • Figure 1: H y P er framework illustrated. (top) Offline meta-training of $f_{\text{val}}$ (depicted in ) on historical datasets for proxy validation (§ \ref{['subsubsec:offline']}); (bottom) Online model selection on a new dataset (§ \ref{['subsubsec:online']}). We accelerate both meta-training and model selection using hypernetworks (HN) (depicted in ; § \ref{['ssec:hn']}).
  • Figure 2: Illustration of the proposed HN. (Top) HN generates weights for a 4-layer AE, with layer widths equal to $[4,2,4,5]$. Weights $\widehat{\mathbf{W}}_{\bm{\phi}}$ is fed into the DOD model, while hidden layers' dimensions are shrunk by the masking $\mathbf{A}$. (Bottom) HN generates weights for a 2-layer AE, with layer widths equal to $[3,5]$. ${\bm{\lambda}}_{arch}$ is padded as $[3,0,0,5]$, and the architecture masking at the second and third layer are set to all zeros. When $\widehat{\mathbf{W}}_{\bm{\phi}}$ is fed into the DOD model, zero masking enables the "No Operation" (No-op), in effect shrinking the DOD model from $4$ layers to $2$ layers.
  • Figure 3: Loss of individual models during scheduled training. Lighter colors depict loss curves of deeper architectures, which enter training early. Over epochs loss is minimized for all models collectively.
  • Figure 4: Avg. running time (log-scale) vs. avg. model ROC Rank. Meta-learning methods are depicted with solid markers. Pareto frontier (red dashed line) shows the best methods under different time budgets. H y P er outperforms all with reasonable computational demand.
  • Figure 5: Distribution of ROC Rank across datasets. H y P er achieves the best performance. Bottom three bars depict H y P er's variants that do not fully tune architecture HPs (for ablation). Paired test results are depicted as significant w/ $^{*}$ at 0.1, $^{**}$ at 0.01, $^{***}$ at 0.001. See $p$-values in Appx. Table \ref{['table:ae_pairs']}.
  • ...and 4 more figures