Table of Contents
Fetching ...

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

Moru Liu, Hao Dong, Jessica Kelly, Olga Fink, Mario Trapp

TL;DR

This work tackles the critical problem of out-of-distribution detection and segmentation in multimodal settings, where prior methods largely focus on unimodal data. It introduces Feature Mixing, a lightweight, theoretically supported method that synthesizes multimodal outliers in feature space by swapping a subset of dimensions between modalities, producing low-likelihood yet bounded outliers. Entropy-based optimization on these outliers augments training to better separate in-distribution and out-of-distribution signals, while a simple late-fusion framework enables practical, cross-modal application. The authors also contribute CARLA-OOD, a challenging synthetic dataset for multimodal OOD segmentation, and demonstrate state-of-the-art performance with substantial speedups (10x for OOD detection and 370x for segmentation) across eight datasets and four modalities, validating the method’s versatility and impact for real-time safety-critical systems.

Abstract

Out-of-distribution (OOD) detection and segmentation are crucial for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. While prior research has primarily focused on unimodal image data, real-world applications are inherently multimodal, requiring the integration of multiple modalities for improved OOD detection. A key challenge is the lack of supervision signals from unknown data, leading to overconfident predictions on OOD samples. To address this challenge, we propose Feature Mixing, an extremely simple and fast method for multimodal outlier synthesis with theoretical support, which can be further optimized to help the model better distinguish between in-distribution (ID) and OOD data. Feature Mixing is modality-agnostic and applicable to various modality combinations. Additionally, we introduce CARLA-OOD, a novel multimodal dataset for OOD segmentation, featuring synthetic OOD objects across diverse scenes and weather conditions. Extensive experiments on SemanticKITTI, nuScenes, CARLA-OOD datasets, and the MultiOOD benchmark demonstrate that Feature Mixing achieves state-of-the-art performance with a $10 \times$ to $370 \times$ speedup. Our source code and dataset will be available at https://github.com/mona4399/FeatureMixing.

Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation

TL;DR

This work tackles the critical problem of out-of-distribution detection and segmentation in multimodal settings, where prior methods largely focus on unimodal data. It introduces Feature Mixing, a lightweight, theoretically supported method that synthesizes multimodal outliers in feature space by swapping a subset of dimensions between modalities, producing low-likelihood yet bounded outliers. Entropy-based optimization on these outliers augments training to better separate in-distribution and out-of-distribution signals, while a simple late-fusion framework enables practical, cross-modal application. The authors also contribute CARLA-OOD, a challenging synthetic dataset for multimodal OOD segmentation, and demonstrate state-of-the-art performance with substantial speedups (10x for OOD detection and 370x for segmentation) across eight datasets and four modalities, validating the method’s versatility and impact for real-time safety-critical systems.

Abstract

Out-of-distribution (OOD) detection and segmentation are crucial for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. While prior research has primarily focused on unimodal image data, real-world applications are inherently multimodal, requiring the integration of multiple modalities for improved OOD detection. A key challenge is the lack of supervision signals from unknown data, leading to overconfident predictions on OOD samples. To address this challenge, we propose Feature Mixing, an extremely simple and fast method for multimodal outlier synthesis with theoretical support, which can be further optimized to help the model better distinguish between in-distribution (ID) and OOD data. Feature Mixing is modality-agnostic and applicable to various modality combinations. Additionally, we introduce CARLA-OOD, a novel multimodal dataset for OOD segmentation, featuring synthetic OOD objects across diverse scenes and weather conditions. Extensive experiments on SemanticKITTI, nuScenes, CARLA-OOD datasets, and the MultiOOD benchmark demonstrate that Feature Mixing achieves state-of-the-art performance with a to speedup. Our source code and dataset will be available at https://github.com/mona4399/FeatureMixing.

Paper Structure

This paper contains 28 sections, 2 theorems, 34 equations, 15 figures, 7 tables, 1 algorithm.

Key Result

Theorem 1

Outliers $\mathbf{F}_o$ synthesized by Feature Mixing lie in low-likelihood regions of the distribution of the ID features $\mathbf{F}$, complying with the criterion for real outliers.

Figures (15)

  • Figure 1: Mixup zhang2017mixup is efficient for outlier synthesis but performs poorly in OOD segmentation. In contrast, NP-Mix dong2024multiood achieves strong OOD segmentation but is computationally expensive. Our Feature Mixing combines both speed and performance, benefiting from its simple yet effective design. Results are on SemanticKITTI dataset.
  • Figure 2: (a) Uncertainty-based OOD methods face overconfidence issues, resulting in significant overlap between the score distributions of ID and OOD samples. (b) After training with outlier optimization, the confidence scores for ID and OOD samples become more distinct, enabling the model to better differentiate them. Results are on CARLA-OOD dataset.
  • Figure 3: Illustration of Feature Mixing.
  • Figure 4: Visualization of multimodal outlier synthesis results. Our Feature Mixing excels at generating outlier samples by spanning wider embedding spaces without injecting noise at an extremely fast speed.
  • Figure 5: Overview of the proposed framework that integrates Feature Mixing for multimodal OOD detection and segmentation.
  • ...and 10 more figures

Theorems & Definitions (2)

  • Theorem 1
  • Theorem 2