Table of Contents
Fetching ...

Deep Metric Learning-Based Out-of-Distribution Detection with Synthetic Outlier Exposure

Assefa Seyoum Wahd

TL;DR

This work tackles OOD detection in classification by integrating deep metric learning with synthetic OOD data generated via conditional diffusion models using a label-mixup strategy. It adapts angular-margin metric losses (SphereFace, CosFace, ArcFace, AdaCos) as OOD score functions and trains detectors with and without synthetic OOD data. Empirical results on CIFAR-10 as ID and several OOD benchmarks show that diffusion-based synthetic OOD data plus metric-learning losses improves AUROC and AUPR over softmax baselines, while preserving ID accuracy. The approach provides a scalable path to cover diverse OOD scenarios without relying on real-world unlabeled data and marks the first use of diffusion-based label mixup for synthetic OOD in a multi-class setting.

Abstract

In this paper, we present a novel approach that combines deep metric learning and synthetic data generation using diffusion models for out-of-distribution (OOD) detection. One popular approach for OOD detection is outlier exposure, where models are trained using a mixture of in-distribution (ID) samples and ``seen" OOD samples. For the OOD samples, the model is trained to minimize the KL divergence between the output probability and the uniform distribution while correctly classifying the in-distribution (ID) data. In this paper, we propose a label-mixup approach to generate synthetic OOD data using Denoising Diffusion Probabilistic Models (DDPMs). Additionally, we explore recent advancements in metric learning to train our models. In the experiments, we found that metric learning-based loss functions perform better than the softmax. Furthermore, the baseline models (including softmax, and metric learning) show a significant improvement when trained with the generated OOD data. Our approach outperforms strong baselines in conventional OOD detection metrics.

Deep Metric Learning-Based Out-of-Distribution Detection with Synthetic Outlier Exposure

TL;DR

This work tackles OOD detection in classification by integrating deep metric learning with synthetic OOD data generated via conditional diffusion models using a label-mixup strategy. It adapts angular-margin metric losses (SphereFace, CosFace, ArcFace, AdaCos) as OOD score functions and trains detectors with and without synthetic OOD data. Empirical results on CIFAR-10 as ID and several OOD benchmarks show that diffusion-based synthetic OOD data plus metric-learning losses improves AUROC and AUPR over softmax baselines, while preserving ID accuracy. The approach provides a scalable path to cover diverse OOD scenarios without relying on real-world unlabeled data and marks the first use of diffusion-based label mixup for synthetic OOD in a multi-class setting.

Abstract

In this paper, we present a novel approach that combines deep metric learning and synthetic data generation using diffusion models for out-of-distribution (OOD) detection. One popular approach for OOD detection is outlier exposure, where models are trained using a mixture of in-distribution (ID) samples and ``seen" OOD samples. For the OOD samples, the model is trained to minimize the KL divergence between the output probability and the uniform distribution while correctly classifying the in-distribution (ID) data. In this paper, we propose a label-mixup approach to generate synthetic OOD data using Denoising Diffusion Probabilistic Models (DDPMs). Additionally, we explore recent advancements in metric learning to train our models. In the experiments, we found that metric learning-based loss functions perform better than the softmax. Furthermore, the baseline models (including softmax, and metric learning) show a significant improvement when trained with the generated OOD data. Our approach outperforms strong baselines in conventional OOD detection metrics.
Paper Structure (11 sections, 8 equations, 1 figure, 2 tables)

This paper contains 11 sections, 8 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Synthetic OOD data generated using label mixup between CIFAR-10 "airplane" and "automobile" classes. The generated data have a significant diversity and meaningful mixup semantics. For example, a mixup between an airplane class and a automobile class results in an object with features from airplane and automobile.