Table of Contents
Fetching ...

Meta-Learning for Unsupervised Outlier Detection with Optimal Transport

Prabhant Singh, Joaquin Vanschoren

TL;DR

The paper introduces LOTUS, a zero-shot meta-learner that uses low-rank Gromov-Wasserstein distances $GW ext{-}LR^{(r)}$ to quantify dataset similarity after an ICA-based transformation, enabling automated selection of unsupervised outlier detectors. Paired with GAMAOD, an AutoML extension for supervised outlier detection, the framework builds rich meta-data to enable fast, data-driven pipeline recommendations in cold-start scenarios. Empirical results on ADBench show LOTUS outperforms the current state-of-the-art MetaOD and several PyOD baselines, with robust ROPE-based evidence. The approach is open source and extensible to other unsupervised tasks, offering a practical path to generalizable AutoML in domains lacking labeled data.

Abstract

Automated machine learning has been widely researched and adopted in the field of supervised classification and regression, but progress in unsupervised settings has been limited. We propose a novel approach to automate outlier detection based on meta-learning from previous datasets with outliers. Our premise is that the selection of the optimal outlier detection technique depends on the inherent properties of the data distribution. We leverage optimal transport in particular, to find the dataset with the most similar underlying distribution, and then apply the outlier detection techniques that proved to work best for that data distribution. We evaluate the robustness of our approach and find that it outperforms the state of the art methods in unsupervised outlier detection. This approach can also be easily generalized to automate other unsupervised settings.

Meta-Learning for Unsupervised Outlier Detection with Optimal Transport

TL;DR

The paper introduces LOTUS, a zero-shot meta-learner that uses low-rank Gromov-Wasserstein distances to quantify dataset similarity after an ICA-based transformation, enabling automated selection of unsupervised outlier detectors. Paired with GAMAOD, an AutoML extension for supervised outlier detection, the framework builds rich meta-data to enable fast, data-driven pipeline recommendations in cold-start scenarios. Empirical results on ADBench show LOTUS outperforms the current state-of-the-art MetaOD and several PyOD baselines, with robust ROPE-based evidence. The approach is open source and extensible to other unsupervised tasks, offering a practical path to generalizable AutoML in domains lacking labeled data.

Abstract

Automated machine learning has been widely researched and adopted in the field of supervised classification and regression, but progress in unsupervised settings has been limited. We propose a novel approach to automate outlier detection based on meta-learning from previous datasets with outliers. Our premise is that the selection of the optimal outlier detection technique depends on the inherent properties of the data distribution. We leverage optimal transport in particular, to find the dataset with the most similar underlying distribution, and then apply the outlier detection techniques that proved to work best for that data distribution. We evaluate the robustness of our approach and find that it outperforms the state of the art methods in unsupervised outlier detection. This approach can also be easily generalized to automate other unsupervised settings.
Paper Structure (24 sections, 10 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 24 sections, 10 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: An overview of LOTUS
  • Figure 2: An overview of GAMAOD
  • Figure 3: ROPE test LOTUS vs MetaOD.
  • Figure 4: Comparison of average rank (lower is better) of methods w.r.t. performance across datasets in ADBench.
  • Figure 5: ROPE test result of LOTUS vs (a) ABOD (b) HBOS (c) COF (d) IForest (e) LODA (f) KNN (g) OCSVM