Table of Contents
Fetching ...

How Low Can You Go? Surfacing Prototypical In-Distribution Samples for Unsupervised Anomaly Detection

Felix Meissen, Johannes Getzner, Alexander Ziller, Özgün Turgut, Georgios Kaissis, Martin J. Menten, Daniel Rueckert

TL;DR

This work shows that UAD with extremely few training samples can already match -- and in some cases even surpass -- the performance of training with the whole training dataset, and proposes an unsupervised method to reliably identify prototypical samples to further boost UAD performance.

Abstract

Unsupervised anomaly detection (UAD) alleviates large labeling efforts by training exclusively on unlabeled in-distribution data and detecting outliers as anomalies. Generally, the assumption prevails that large training datasets allow the training of higher-performing UAD models. However, in this work, we show that UAD with extremely few training samples can already match -- and in some cases even surpass -- the performance of training with the whole training dataset. Building upon this finding, we propose an unsupervised method to reliably identify prototypical samples to further boost UAD performance. We demonstrate the utility of our method on seven different established UAD benchmarks from computer vision, industrial defect detection, and medicine. With just 25 selected samples, we even exceed the performance of full training in $25/67$ categories in these benchmarks. Additionally, we show that the prototypical in-distribution samples identified by our proposed method generalize well across models and datasets and that observing their sample selection criteria allows for a successful manual selection of small subsets of high-performing samples. Our code is available at https://anonymous.4open.science/r/uad_prototypical_samples/

How Low Can You Go? Surfacing Prototypical In-Distribution Samples for Unsupervised Anomaly Detection

TL;DR

This work shows that UAD with extremely few training samples can already match -- and in some cases even surpass -- the performance of training with the whole training dataset, and proposes an unsupervised method to reliably identify prototypical samples to further boost UAD performance.

Abstract

Unsupervised anomaly detection (UAD) alleviates large labeling efforts by training exclusively on unlabeled in-distribution data and detecting outliers as anomalies. Generally, the assumption prevails that large training datasets allow the training of higher-performing UAD models. However, in this work, we show that UAD with extremely few training samples can already match -- and in some cases even surpass -- the performance of training with the whole training dataset. Building upon this finding, we propose an unsupervised method to reliably identify prototypical samples to further boost UAD performance. We demonstrate the utility of our method on seven different established UAD benchmarks from computer vision, industrial defect detection, and medicine. With just 25 selected samples, we even exceed the performance of full training in categories in these benchmarks. Additionally, we show that the prototypical in-distribution samples identified by our proposed method generalize well across models and datasets and that observing their sample selection criteria allows for a successful manual selection of small subsets of high-performing samples. Our code is available at https://anonymous.4open.science/r/uad_prototypical_samples/
Paper Structure (31 sections, 10 equations, 8 figures, 6 tables, 1 algorithm)

This paper contains 31 sections, 10 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: Selecting only a few prototypical in-distribution samples (identified by our method) for training can result in higher anomaly detection performance than training with $100\%$ of the available data. Results for anomaly detection on the cat class from CIFAR10. Black dashed: full training. Yellow: yellowishrandomly selected samples, including standard deviations over different random selections. Green: greenishBest-performing samples identified with our method.
  • Figure 2: Performance in RSNA does not increase with more training samples.
  • Figure 3: Best- and worst-performing samples in CIFAR10 and RSNA. Identified using our proposed core-set selection strategy.
  • Figure 4: Prototypical samples transfer well to other datasets and models. Left: Surfaced samples with the RD model achieve high performance when used with FAE. Test performance of FAE on RSNA when samples are selected using RD (full lines) or FAE (dashed lines). Right: Training with 25 carefully selected samples from RSNA can exceed full training performance with CheXpert (8443 samples) when evaluated on the latter. Test performance on CheXpert after training on CheXpert samples (black, dashed line) or RSNA (other lines).
  • Figure 5: Only 10 representative training samples are needed to surpass the performance of training with the whole dataset on five out of the ten classes in CIFAR10. AUROC for training with 1, 5, 10, and 25 random or best (greedy selection) samples on CIFAR10. For random samples, the experiments were repeated ten times with different samples. The dashed black line represents training with all 4000 ID samples.
  • ...and 3 more figures