Table of Contents
Fetching ...

Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning

Yan Zhang, Ming Li, Chun Li, Zhaoxia Liu, Ye Zhang, Fei Richard Yu

TL;DR

This work tackles uncertainty in multi-view representation learning by introducing an uncertainty-aware framework based on Hölder divergence (HD), Dirichlet modeling, and Dempster–Shafer fusion (HDMVL). By replacing the traditional KL objective with Hölder divergence in a variational Dirichlet backbone, the method provides calibrated uncertainty estimates and improved fusion across modalities. The authors derive a HPD-based ELBO for Dirichlet distributions, incorporate a pseudo-view for richer cross-view interactions, and fuse evidence via DS theory, achieving state-of-the-art results on RGB-D classification and multi-view clustering benchmarks. The approach demonstrates robustness to incomplete or noisy data and yields practical gains in recognition and clustering accuracy, highlighting the value of HD in uncertainty quantification for multi-view learning.

Abstract

Evidence-based deep learning represents a burgeoning paradigm for uncertainty estimation, offering reliable predictions with negligible extra computational overheads. Existing methods usually adopt Kullback-Leibler divergence to estimate the uncertainty of network predictions, ignoring domain gaps among various modalities. To tackle this issue, this paper introduces a novel algorithm based on Hölder Divergence (HD) to enhance the reliability of multi-view learning by addressing inherent uncertainty challenges from incomplete or noisy data. Generally, our method extracts the representations of multiple modalities through parallel network branches, and then employs HD to estimate the prediction uncertainties. Through the Dempster-Shafer theory, integration of uncertainty from different modalities, thereby generating a comprehensive result that considers all available representations. Mathematically, HD proves to better measure the ``distance'' between real data distribution and predictive distribution of the model and improve the performances of multi-class recognition tasks. Specifically, our method surpass the existing state-of-the-art counterparts on all evaluating benchmarks. We further conduct extensive experiments on different backbones to verify our superior robustness. It is demonstrated that our method successfully pushes the corresponding performance boundaries. Finally, we perform experiments on more challenging scenarios, \textit{i.e.}, learning with incomplete or noisy data, revealing that our method exhibits a high tolerance to such corrupted data.

Uncertainty Quantification via Hölder Divergence for Multi-View Representation Learning

TL;DR

This work tackles uncertainty in multi-view representation learning by introducing an uncertainty-aware framework based on Hölder divergence (HD), Dirichlet modeling, and Dempster–Shafer fusion (HDMVL). By replacing the traditional KL objective with Hölder divergence in a variational Dirichlet backbone, the method provides calibrated uncertainty estimates and improved fusion across modalities. The authors derive a HPD-based ELBO for Dirichlet distributions, incorporate a pseudo-view for richer cross-view interactions, and fuse evidence via DS theory, achieving state-of-the-art results on RGB-D classification and multi-view clustering benchmarks. The approach demonstrates robustness to incomplete or noisy data and yields practical gains in recognition and clustering accuracy, highlighting the value of HD in uncertainty quantification for multi-view learning.

Abstract

Evidence-based deep learning represents a burgeoning paradigm for uncertainty estimation, offering reliable predictions with negligible extra computational overheads. Existing methods usually adopt Kullback-Leibler divergence to estimate the uncertainty of network predictions, ignoring domain gaps among various modalities. To tackle this issue, this paper introduces a novel algorithm based on Hölder Divergence (HD) to enhance the reliability of multi-view learning by addressing inherent uncertainty challenges from incomplete or noisy data. Generally, our method extracts the representations of multiple modalities through parallel network branches, and then employs HD to estimate the prediction uncertainties. Through the Dempster-Shafer theory, integration of uncertainty from different modalities, thereby generating a comprehensive result that considers all available representations. Mathematically, HD proves to better measure the ``distance'' between real data distribution and predictive distribution of the model and improve the performances of multi-class recognition tasks. Specifically, our method surpass the existing state-of-the-art counterparts on all evaluating benchmarks. We further conduct extensive experiments on different backbones to verify our superior robustness. It is demonstrated that our method successfully pushes the corresponding performance boundaries. Finally, we perform experiments on more challenging scenarios, \textit{i.e.}, learning with incomplete or noisy data, revealing that our method exhibits a high tolerance to such corrupted data.

Paper Structure

This paper contains 24 sections, 5 theorems, 24 equations, 3 figures, 8 tables, 1 algorithm.

Key Result

Lemma 1

(HPD and PHD for Conic or Affine Exponential Family) pp3. For distributions $p(x;\theta_p)$ and $p(x;\theta_q)$ that are part of the same exponential family with conic or affine natural parameter space, both the HPD and PHD can be expressed in closed-form: where the log-normalizer, denoted as $F(\theta)$, is a strictly convex function also referred to as the cumulant generating function.

Figures (3)

  • Figure 1: The confident (a-d) and uncertain (e-h) sample-depth pairs predicted by our method on the SUNRGBD pp46 dataset. The comparison reveals the discrepancies between high-confident and uncertain predictions, demonstrating the capacity of our method in handling challenging cases.
  • Figure 2: Overview of Uncertainty Estimation via Hölder Divergence for Multi-View Representation Learning. The image features from different modalities are extracted and classified by three separately trained networks. Then, the reliability (${b_k}$) and uncertainty ($\boldsymbol{\mu}$) of the classification results are estimated using Hölder Divergence (HD). Finally, modal fusion is performed based on the reliability and uncertainty sets ${{\rm M}^i}$, where $i$ represents the modality index. This figure illustrates the process of uncertainty quantification and fusion in multi-view learning.
  • Figure 3: Overview of uncertainty estimation using Hölder divergence for multi-view representation learning. The figure presents t-SNE visualizations of multi-view clustering results across different datasets: (a) Caltech101-7, (b) Caltech101-20, and (c) MSRC-V1. These visualizations demonstrate how our model's uncertainty quantification, based on Hölder divergence, improves clustering performance. Additionally, the figure provides a comparative analysis, highlighting the enhanced separation of clusters and the robustness of our approach across diverse datasets.

Theorems & Definitions (13)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Lemma 1
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • Theorem 3
  • ...and 3 more