Table of Contents
Fetching ...

SSD: A Unified Framework for Self-Supervised Outlier Detection

Vikash Sehwag, Mung Chiang, Prateek Mittal

TL;DR

SSD tackles outlier detection using only unlabeled in-distribution data by learning rich representations through self-supervised contrastive learning and applying a cluster-conditioned Mahalanobis distance in feature space. It demonstrates that this unlabeled approach can outperform many unsupervised detectors and rival supervised methods, across datasets like CIFAR-10/100, STL-10, and ImageNet. The paper further extends the framework with SSDk for few-shot OOD detection and SSD+ which leverages available labels to push performance beyond current state-of-the-art. Together, these contributions provide a flexible, effective solution for OOD detection with minimal labeling requirements and accompany open-source code.

Abstract

We ask the following question: what training information is required to design an effective outlier/out-of-distribution (OOD) detector, i.e., detecting samples that lie far away from the training distribution? Since unlabeled data is easily accessible for many applications, the most compelling approach is to develop detectors based on only unlabeled in-distribution data. However, we observe that most existing detectors based on unlabeled data perform poorly, often equivalent to a random prediction. In contrast, existing state-of-the-art OOD detectors achieve impressive performance but require access to fine-grained data labels for supervised training. We propose SSD, an outlier detector based on only unlabeled in-distribution data. We use self-supervised representation learning followed by a Mahalanobis distance based detection in the feature space. We demonstrate that SSD outperforms most existing detectors based on unlabeled data by a large margin. Additionally, SSD even achieves performance on par, and sometimes even better, with supervised training based detectors. Finally, we expand our detection framework with two key extensions. First, we formulate few-shot OOD detection, in which the detector has access to only one to five samples from each class of the targeted OOD dataset. Second, we extend our framework to incorporate training data labels, if available. We find that our novel detection framework based on SSD displays enhanced performance with these extensions, and achieves state-of-the-art performance. Our code is publicly available at https://github.com/inspire-group/SSD.

SSD: A Unified Framework for Self-Supervised Outlier Detection

TL;DR

SSD tackles outlier detection using only unlabeled in-distribution data by learning rich representations through self-supervised contrastive learning and applying a cluster-conditioned Mahalanobis distance in feature space. It demonstrates that this unlabeled approach can outperform many unsupervised detectors and rival supervised methods, across datasets like CIFAR-10/100, STL-10, and ImageNet. The paper further extends the framework with SSDk for few-shot OOD detection and SSD+ which leverages available labels to push performance beyond current state-of-the-art. Together, these contributions provide a flexible, effective solution for OOD detection with minimal labeling requirements and accompany open-source code.

Abstract

We ask the following question: what training information is required to design an effective outlier/out-of-distribution (OOD) detector, i.e., detecting samples that lie far away from the training distribution? Since unlabeled data is easily accessible for many applications, the most compelling approach is to develop detectors based on only unlabeled in-distribution data. However, we observe that most existing detectors based on unlabeled data perform poorly, often equivalent to a random prediction. In contrast, existing state-of-the-art OOD detectors achieve impressive performance but require access to fine-grained data labels for supervised training. We propose SSD, an outlier detector based on only unlabeled in-distribution data. We use self-supervised representation learning followed by a Mahalanobis distance based detection in the feature space. We demonstrate that SSD outperforms most existing detectors based on unlabeled data by a large margin. Additionally, SSD even achieves performance on par, and sometimes even better, with supervised training based detectors. Finally, we expand our detection framework with two key extensions. First, we formulate few-shot OOD detection, in which the detector has access to only one to five samples from each class of the targeted OOD dataset. Second, we extend our framework to incorporate training data labels, if available. We find that our novel detection framework based on SSD displays enhanced performance with these extensions, and achieves state-of-the-art performance. Our code is publicly available at https://github.com/inspire-group/SSD.

Paper Structure

This paper contains 23 sections, 5 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: AUROC along individual principle eigenvector with CIFAR-10 as in-distribution and CIFAR-100 as OOD. Higher eigenvalues dominates euclidean distance, but are least helpful for outlier detection. Mahalnobis distance avoid this bias with appropriate scaling and performs much better.
  • Figure 2: Ablating across different training parameters in SSD under following setup: In-distribution dataset = CIFAR-10, OOD dataset = CIFAR-100, Training epochs = 500, Batch size = 512.
  • Figure 3: AUROC over the course of training with CIFAR-10 as in-distribution and CIFAR-100 as OOD set.
  • Figure 4: Existing supervised detector requires fine-grained labels. In contrast, SSD can achieve similar performance with only unlabeled data.
  • Figure 5: Relationship of AUROC with clusters depends on which block we use as the feature extractor.
  • ...and 2 more figures