Table of Contents
Fetching ...

Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion

Grigor Bezirganyan, Sana Sellami, Laure Berti-Équille, Sébastien Fournier

TL;DR

Multimodal learning often faces uncertainty from noise and conflicts between modalities, which can lead to overconfident incorrect predictions. The authors propose Discounted Belief Fusion (DBF), an order-invariant fusion scheme built on subjective logic, using a conflict-based discounting mechanism to reallocate mass toward uncertainty when modalities conflict; it relies on generalized belief averaging to scale to $V$ modalities and a per-sample discounting factor $\,\eta^v\,$ derived from a conflict-controlled agreement matrix, with a hyperparameter $\lambda$ controlling discount strength. They also train multimodal evidential networks that emit Dirichlet-based evidence and uncertainties via an exponential output activation and a three-term loss with KL-divergence regularization and a consistency term. Experiments on five benchmarks show improved separation between conflictive and non-conflictive uncertainties while maintaining accuracy, demonstrating practical gains in reliability and interpretability for safety-critical multimodal tasks. This work enables robust, scalable, and uncertainty-aware decision making in multimodal AI.

Abstract

Multimodal AI models are increasingly used in fields like healthcare, finance, and autonomous driving, where information is drawn from multiple sources or modalities such as images, texts, audios, videos. However, effectively managing uncertainty - arising from noise, insufficient evidence, or conflicts between modalities - is crucial for reliable decision-making. Current uncertainty-aware machine learning methods leveraging, for example, evidence averaging, or evidence accumulation underestimate uncertainties in high-conflict scenarios. Moreover, the state-of-the-art evidence averaging strategy is not order invariant and fails to scale to multiple modalities. To address these challenges, we propose a novel multimodal learning method with order-invariant evidence fusion and introduce a conflict-based discounting mechanism that reallocates uncertain mass when unreliable modalities are detected. We provide both theoretical analysis and experimental validation, demonstrating that unlike the previous work, the proposed approach effectively distinguishes between conflicting and non-conflicting samples based on the provided uncertainty estimates, and outperforms the previous models in uncertainty-based conflict detection.

Multimodal Learning with Uncertainty Quantification based on Discounted Belief Fusion

TL;DR

Multimodal learning often faces uncertainty from noise and conflicts between modalities, which can lead to overconfident incorrect predictions. The authors propose Discounted Belief Fusion (DBF), an order-invariant fusion scheme built on subjective logic, using a conflict-based discounting mechanism to reallocate mass toward uncertainty when modalities conflict; it relies on generalized belief averaging to scale to modalities and a per-sample discounting factor derived from a conflict-controlled agreement matrix, with a hyperparameter controlling discount strength. They also train multimodal evidential networks that emit Dirichlet-based evidence and uncertainties via an exponential output activation and a three-term loss with KL-divergence regularization and a consistency term. Experiments on five benchmarks show improved separation between conflictive and non-conflictive uncertainties while maintaining accuracy, demonstrating practical gains in reliability and interpretability for safety-critical multimodal tasks. This work enables robust, scalable, and uncertainty-aware decision making in multimodal AI.

Abstract

Multimodal AI models are increasingly used in fields like healthcare, finance, and autonomous driving, where information is drawn from multiple sources or modalities such as images, texts, audios, videos. However, effectively managing uncertainty - arising from noise, insufficient evidence, or conflicts between modalities - is crucial for reliable decision-making. Current uncertainty-aware machine learning methods leveraging, for example, evidence averaging, or evidence accumulation underestimate uncertainties in high-conflict scenarios. Moreover, the state-of-the-art evidence averaging strategy is not order invariant and fails to scale to multiple modalities. To address these challenges, we propose a novel multimodal learning method with order-invariant evidence fusion and introduce a conflict-based discounting mechanism that reallocates uncertain mass when unreliable modalities are detected. We provide both theoretical analysis and experimental validation, demonstrating that unlike the previous work, the proposed approach effectively distinguishes between conflicting and non-conflicting samples based on the provided uncertainty estimates, and outperforms the previous models in uncertainty-based conflict detection.

Paper Structure

This paper contains 18 sections, 2 theorems, 21 equations, 7 figures, 7 tables, 1 algorithm.

Key Result

Proposition 1

When using an averaging belief fusion operator, the evidence associated with previously fused terms is reduced by a factor of two each time a new term is incorporated, relative to the evidence of the newly fused term.

Figures (7)

  • Figure 1: The general pipeline of the Discounted Belief Fusion: First, subjective opinions are being formed for each modality using unimodal classifiers. Then conflict matrix is computed, which is then used to compute discounting factors for each modality. Finally, the beliefs of each modality are being adjusted using the discounting factors and are fused together using generalized belief averaging.
  • Figure 2: The Conflict Matrix ${\bm{C}}$ for 4 opinions.
  • Figure 3: Conflict versus Agreement according to equation \ref{['eq:updated_discounting_factor']}. Higher values of $\lambda$ reduce discounting for lower levels of conflict.
  • Figure 4: Uncertainty distributions on normal and conflictive test sets using Belief Constraint Fusion (BCF), Belief Averaging Fusion (BAF), and Discounted Belief Fusion (DBF). $\lambda = 1$ is used for all datasets.
  • Figure 5: Average uncertainty values across different modalities (denoted as M1, M2, etc.) for CalTech, CUB, HandWritten, Scene, and PIE datasets. Error bars indicate the standard deviation, highlighting the low variability in uncertainty measurements.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Definition 1: Degree of Conflict