Table of Contents
Fetching ...

DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection

Shawn Li, Huixian Gong, Hao Dong, Tiankai Yang, Zhengzhong Tu, Yue Zhao

TL;DR

Dynamic Prototype Updating (DPU) is proposed, a novel plug-and-play framework for multimodal OOD detection that accounts for intra-class variations that improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection.

Abstract

Out-of-distribution (OOD) detection is essential for ensuring the robustness of machine learning models by identifying samples that deviate from the training distribution. While traditional OOD detection has primarily focused on single-modality inputs, such as images, recent advances in multimodal models have demonstrated the potential of leveraging multiple modalities (e.g., video, optical flow, audio) to enhance detection performance. However, existing methods often overlook intra-class variability within in-distribution (ID) data, assuming that samples of the same class are perfectly cohesive and consistent. This assumption can lead to performance degradation, especially when prediction discrepancies are uniformly amplified across all samples. To address this issue, we propose Dynamic Prototype Updating (DPU), a novel plug-and-play framework for multimodal OOD detection that accounts for intra-class variations. Our method dynamically updates class center representations for each class by measuring the variance of similar samples within each batch, enabling adaptive adjustments. This approach allows us to amplify prediction discrepancies based on the updated class centers, thereby improving the model's robustness and generalization across different modalities. Extensive experiments on two tasks, five datasets, and nine base OOD algorithms demonstrate that DPU significantly improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection, with improvements of up to 80 percent in Far-OOD detection. To facilitate accessibility and reproducibility, our code is publicly available on GitHub.

DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection

TL;DR

Dynamic Prototype Updating (DPU) is proposed, a novel plug-and-play framework for multimodal OOD detection that accounts for intra-class variations that improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection.

Abstract

Out-of-distribution (OOD) detection is essential for ensuring the robustness of machine learning models by identifying samples that deviate from the training distribution. While traditional OOD detection has primarily focused on single-modality inputs, such as images, recent advances in multimodal models have demonstrated the potential of leveraging multiple modalities (e.g., video, optical flow, audio) to enhance detection performance. However, existing methods often overlook intra-class variability within in-distribution (ID) data, assuming that samples of the same class are perfectly cohesive and consistent. This assumption can lead to performance degradation, especially when prediction discrepancies are uniformly amplified across all samples. To address this issue, we propose Dynamic Prototype Updating (DPU), a novel plug-and-play framework for multimodal OOD detection that accounts for intra-class variations. Our method dynamically updates class center representations for each class by measuring the variance of similar samples within each batch, enabling adaptive adjustments. This approach allows us to amplify prediction discrepancies based on the updated class centers, thereby improving the model's robustness and generalization across different modalities. Extensive experiments on two tasks, five datasets, and nine base OOD algorithms demonstrate that DPU significantly improves OOD detection performance, setting a new state-of-the-art in multimodal OOD detection, with improvements of up to 80 percent in Far-OOD detection. To facilitate accessibility and reproducibility, our code is publicly available on GitHub.

Paper Structure

This paper contains 25 sections, 18 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Performance of our DPU applied to four base OOD methods in the Multimodal Far-OOD Detection task (§ \ref{['ssec:task-impl-detail']}), using HMDB51 as the ID dataset and Kinetics600 as the OOD dataset. Red symbols denote the OOD methods enhanced by DPU, demonstrating that DPU significantly improves their performances.
  • Figure 2: Overview of our DPU. It dynamically adjusts multimodal discrepancy intensification based on each sample’s distance to its class prototype. Key components include: (Step 1, § \ref{['subsec:contrastive']} ) Cohesive-Separate Contrastive Training (CSCT), which aims to preserve intra-class cohesion and inter-class distinctions while capturing within-class variances; (Step 2, § \ref{['subsec:approximation']}) Dynamic Prototype Approximation (DPA), which refines prototypes to ensure they remain representative despite class outliers; (Step 3, § \ref{['subsec:proration']}) Pro-ratio Discrepancy Intensification (PDI), which adjusts discrepancy based on sample-prototype similarity, boosting ID accuracy and robustness. Finally, OOD models leverage both joint and modality-specific features for robust OOD detection.
  • Figure 3: Ablation study on contrastive learning methods. The experiment is conducted using MSP on Near-OOD detection with UCF101, and Kinetics-600 dataset. Original: plain MSP. InfoNCE: the classic contrastive learning method oord2018representation. CSCT: proposed Cohesive-Separate Contrastive Training.
  • Figure 4: Visualization of the learned embeddings on ID and OOD data using t-SNE on the UCF101 50/51 dataset before and after training with DPU. We observe better separation after using DPU.
  • Figure A: The ID accuracy declines after using uniform discrepancy intensification in the SOTA framework dong2024multiood (denoted as 'AN'; the middle bars), and the accuracy improves using our proposed DPU (the right bars). This figure presents the results of MSP and ReAct in Far-OOD detection using HMDB51 as the ID dataset.