Adaptive Confidence Regularization for Multimodal Failure Detection

Moru Liu; Hao Dong; Olga Fink; Mario Trapp

Adaptive Confidence Regularization for Multimodal Failure Detection

Moru Liu, Hao Dong, Olga Fink, Mario Trapp

TL;DR

This work proposes Adaptive Confidence Regularization (ACR), a novel framework specifically designed to detect multimodal failures, and introduces an Adaptive Confidence Loss that penalizes such degradations during training.

Abstract

The deployment of multimodal models in high-stakes domains, such as self-driving vehicles and medical diagnostics, demands not only strong predictive performance but also reliable mechanisms for detecting failures. In this work, we address the largely unexplored problem of failure detection in multimodal contexts. We propose Adaptive Confidence Regularization (ACR), a novel framework specifically designed to detect multimodal failures. Our approach is driven by a key observation: in most failure cases, the confidence of the multimodal prediction is significantly lower than that of at least one unimodal branch, a phenomenon we term confidence degradation. To mitigate this, we introduce an Adaptive Confidence Loss that penalizes such degradations during training. In addition, we propose Multimodal Feature Swapping, a novel outlier synthesis technique that generates challenging, failure-aware training examples. By training with these synthetic failures, ACR learns to more effectively recognize and reject uncertain predictions, thereby improving overall reliability. Extensive experiments across four datasets, three modalities, and multiple evaluation settings demonstrate that ACR achieves consistent and robust gains. The source code will be available at https://github.com/mona4399/ACR.

Adaptive Confidence Regularization for Multimodal Failure Detection

TL;DR

Abstract

Paper Structure (25 sections, 2 theorems, 10 equations, 10 figures, 12 tables, 2 algorithms)

This paper contains 25 sections, 2 theorems, 10 equations, 10 figures, 12 tables, 2 algorithms.

Introduction
Methodology
Problem Setup
Confidence Degradation: A Failure Indicator in Multimodal Systems
Proposed ACR Framework
Adaptive Confidence Loss
Multimodal Feature Swapping
Inference
Experiments
Experimental Setup
Main Results
Ablation Studies
Conclusion
Theoretical Analysis on Confidence Degradation
Broader Impact, Limitations, and Future Work
...and 10 more sections

Key Result

Theorem 1

Adding more modalities cannot increase the conditional entropy:

Figures (10)

Figure 1: (Left) Multimodal models substantially enhance FD performance compared to unimodal models, without the need for complex designs. (Right) Advanced OOD detection methods underperform on FD tasks, while the simple MSP baseline surprisingly remains the most effective.
Figure 2: Misclassified samples exhibit a significantly higher proportion of confidence degradation compared to correctly classified ones.
Figure 3: Our ACR framework integrates two principal components. The Adaptive Confidence Loss is designed to penalize the phenomenon of confidence degradation. The Multimodal Feature Swapping serves to generate challenging, failure-aware training instances. This process enables the model to learn to more effectively identify and reject uncertain samples.
Figure 4: Visualization on outliers generated by Multimodal Feature Swapping with different $n_{\text{swap}}$ (96, 128, 256, 512). Small swaps produce hard negatives that lie near the in-distribution manifold, while larger swaps create more distinct outliers further away.
Figure 5: FD under distribution shift on HAC dataset. The performance of five types of corruption on videos under the severity level of $5$ is reported.
...and 5 more figures

Theorems & Definitions (3)

Definition 1: Confidence Degradation
Theorem 1
Theorem 2

Adaptive Confidence Regularization for Multimodal Failure Detection

TL;DR

Abstract

Adaptive Confidence Regularization for Multimodal Failure Detection

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (3)