Table of Contents
Fetching ...

Credibility-Aware Multi-Modal Fusion Using Probabilistic Circuits

Sahil Sidheekh, Pranuthi Tenali, Saurabh Mathur, Erik Blasch, Kristian Kersting, Sriraam Natarajan

TL;DR

The paper tackles credibility-aware late fusion for noisy, multi-modal data by modeling the joint distribution of unimodal predictions and the target with Probabilistic Circuits (PCs). It defines a principled credibility measure based on divergence and conditional entropy, and introduces two fusion variants: Direct-PC (DPC) and Credibility-Weighted Mean (CWM). The authors establish that PCs enable tractable inference for predictive and credibility queries and demonstrate competitive performance across multiple datasets (AV-MNIST, CUB, NYUD, SUNRGBD) while providing reliable modality credibility estimates. The work offers a principled, robust, and scalable approach to multi-modal fusion with explicit uncertainty and source reliability considerations, with potential impact on safety-critical applications.

Abstract

We consider the problem of late multi-modal fusion for discriminative learning. Motivated by noisy, multi-source domains that require understanding the reliability of each data source, we explore the notion of credibility in the context of multi-modal fusion. We propose a combination function that uses probabilistic circuits (PCs) to combine predictive distributions over individual modalities. We also define a probabilistic measure to evaluate the credibility of each modality via inference queries over the PC. Our experimental evaluation demonstrates that our fusion method can reliably infer credibility while maintaining competitive performance with the state-of-the-art.

Credibility-Aware Multi-Modal Fusion Using Probabilistic Circuits

TL;DR

The paper tackles credibility-aware late fusion for noisy, multi-modal data by modeling the joint distribution of unimodal predictions and the target with Probabilistic Circuits (PCs). It defines a principled credibility measure based on divergence and conditional entropy, and introduces two fusion variants: Direct-PC (DPC) and Credibility-Weighted Mean (CWM). The authors establish that PCs enable tractable inference for predictive and credibility queries and demonstrate competitive performance across multiple datasets (AV-MNIST, CUB, NYUD, SUNRGBD) while providing reliable modality credibility estimates. The work offers a principled, robust, and scalable approach to multi-modal fusion with explicit uncertainty and source reliability considerations, with potential impact on safety-critical applications.

Abstract

We consider the problem of late multi-modal fusion for discriminative learning. Motivated by noisy, multi-source domains that require understanding the reliability of each data source, we explore the notion of credibility in the context of multi-modal fusion. We propose a combination function that uses probabilistic circuits (PCs) to combine predictive distributions over individual modalities. We also define a probabilistic measure to evaluate the credibility of each modality via inference queries over the PC. Our experimental evaluation demonstrates that our fusion method can reliably infer credibility while maintaining competitive performance with the state-of-the-art.
Paper Structure (15 sections, 4 theorems, 22 equations, 4 figures, 5 tables)

This paper contains 15 sections, 4 theorems, 22 equations, 4 figures, 5 tables.

Key Result

Theorem 3.1

The expected credibility $\mathcal{C}^{j}$ of a modality $j$ in predicting the target $Y,$ under a Marginal Dominant distribution is lower bounded by the negative of the conditional entropy $(\mathbb{H})$ of the unimodal predictive distribution of modality $j$ over $Y,$ given the predictive distribu

Figures (4)

  • Figure 1: Model Diagram for our proposed PC-based fusion method. Each input modality $\mathbf{X}_i$ is processed by a unimodal predictor $\mathcal{M}_{\phi_i}$ to get the corresponding predictive distribution $\mathbf{p}_i$ over the target $Y$. A probabilistic circuit $\theta$ is used to model the joint distribution over the unimodal predictive distributions and $Y$, and the final prediction is obtained by running an inference routine over it, governed by the form of fusion function employed ($\mathcal{M}_{\theta}$).
  • Figure 2: Mean Validation Relative Credibility obtained using a PC for the two modalities of the AV-MNIST dataset across training epochs. Varying degrees of noise (controlled by $\lambda$) are introduced into the audio modality.
  • Figure 3: Mean Test Relative Credibility outputted by a PC for the two modalities of the AV-MNIST dataset across varying degrees of noise (controlled by $\lambda$) introduced into each modality.
  • Figure 4: Robustness to Noise. Mean test performance of late fusion methods across varying degrees of noise.

Theorems & Definitions (10)

  • Definition 1
  • Definition 2
  • Theorem 3.1
  • proof
  • Theorem 3.2
  • proof
  • Theorem A.1
  • proof
  • Theorem A.2
  • proof