Table of Contents
Fetching ...

Multi-QuAD: Multi-Level Quality-Adaptive Dynamic Network for Reliable Multimodal Classification

Shu Shen, C. L. Philip Chen, Tong Zhang

TL;DR

Multi-QuAD addresses reliability in multimodal classification under varying data quality by introducing NFCE, which estimates sample quality without classifiers using noise-free prototypes, and two dynamic mechanisms—GCND for sample-specific depth and LGP for cross-modality parameter prediction. By normalizing depth across modalities and enabling sample-adaptive parameters, Multi-QuAD achieves higher accuracy and robustness across four diverse datasets, especially under Gaussian noise, while maintaining computational efficiency. The approach offers reliable, multi-level quality assessment and dynamic adaptation, advancing trustworthy inference in multimodal systems. This framework has practical implications for safety-critical applications where data quality is unpredictable and dependably calibrated performance is essential.

Abstract

Multimodal machine learning has achieved remarkable progress in many scenarios, but its reliability is undermined by varying sample quality. This paper finds that existing reliable multimodal classification methods not only fail to provide robust estimation of data quality, but also lack dynamic networks for sample-specific depth and parameters to achieve reliable inference. To this end, a novel framework for multimodal reliable classification termed \textit{Multi-level Quality-Adaptive Dynamic multimodal network} (Multi-QuAD) is proposed. Multi-QuAD first adopts a novel approach based on noise-free prototypes and a classifier-free design to reliably estimate the quality of each sample at both modality and feature levels. It then achieves sample-specific network depth via the \textbf{\textit{Global Confidence Normalized Depth (GCND)}} mechanism. By normalizing depth across modalities and samples, \textit{\textbf{GCND}} effectively mitigates the impact of challenging modality inputs on dynamic depth reliability. Furthermore, Multi-QuAD provides sample-adaptive network parameters via the \textbf{\textit{Layer-wise Greedy Parameter (LGP)}} mechanism driven by feature-level quality. The cross-modality layer-wise greedy strategy in \textbf{\textit{LGP}} designs a reliable parameter prediction paradigm for multimodal networks with variable architecture for the first time. Experiments conducted on four datasets demonstrate that Multi-QuAD significantly outperforms state-of-the-art methods in classification performance and reliability, exhibiting strong adaptability to data with diverse quality.

Multi-QuAD: Multi-Level Quality-Adaptive Dynamic Network for Reliable Multimodal Classification

TL;DR

Multi-QuAD addresses reliability in multimodal classification under varying data quality by introducing NFCE, which estimates sample quality without classifiers using noise-free prototypes, and two dynamic mechanisms—GCND for sample-specific depth and LGP for cross-modality parameter prediction. By normalizing depth across modalities and enabling sample-adaptive parameters, Multi-QuAD achieves higher accuracy and robustness across four diverse datasets, especially under Gaussian noise, while maintaining computational efficiency. The approach offers reliable, multi-level quality assessment and dynamic adaptation, advancing trustworthy inference in multimodal systems. This framework has practical implications for safety-critical applications where data quality is unpredictable and dependably calibrated performance is essential.

Abstract

Multimodal machine learning has achieved remarkable progress in many scenarios, but its reliability is undermined by varying sample quality. This paper finds that existing reliable multimodal classification methods not only fail to provide robust estimation of data quality, but also lack dynamic networks for sample-specific depth and parameters to achieve reliable inference. To this end, a novel framework for multimodal reliable classification termed \textit{Multi-level Quality-Adaptive Dynamic multimodal network} (Multi-QuAD) is proposed. Multi-QuAD first adopts a novel approach based on noise-free prototypes and a classifier-free design to reliably estimate the quality of each sample at both modality and feature levels. It then achieves sample-specific network depth via the \textbf{\textit{Global Confidence Normalized Depth (GCND)}} mechanism. By normalizing depth across modalities and samples, \textit{\textbf{GCND}} effectively mitigates the impact of challenging modality inputs on dynamic depth reliability. Furthermore, Multi-QuAD provides sample-adaptive network parameters via the \textbf{\textit{Layer-wise Greedy Parameter (LGP)}} mechanism driven by feature-level quality. The cross-modality layer-wise greedy strategy in \textbf{\textit{LGP}} designs a reliable parameter prediction paradigm for multimodal networks with variable architecture for the first time. Experiments conducted on four datasets demonstrate that Multi-QuAD significantly outperforms state-of-the-art methods in classification performance and reliability, exhibiting strong adaptability to data with diverse quality.

Paper Structure

This paper contains 35 sections, 11 equations, 11 figures, 7 tables, 1 algorithm.

Figures (11)

  • Figure 1: Empirical studies under varying data quality. We simulate data quality degradation by adding Gaussian noise to one of the modalities on BRCA dataset, with $\sigma$ representing the noise intensity. (a) Classification accuracy (ACC) of three state-of-the-art methods (GCFANet zheng2024global, PDF pmlr-v235-cao24c, CALM zhou2023calm) under different depths of their unimodal network depths corresponding to the noisy modality. (b) Visualization of the model parameters required to map samples with different features to their corresponding class centers. (c) The confidence estimation results on the same test samples provided by TCP han2022multimodal as a representative example after training on data with different noise intensities.More details of the observation experiments are included in the Supplemental Materials.
  • Figure 2: The framework of the proposed Multi-QuAD (better viewed in colour). Without loss of generality, this figure illustrates the case of two modalities, with blue and yellow representing different modalities. For an input sample $x_i$ from dataset $\mathcal{D}$: (a) the modality-level and feature-level quality of $x_i$ is estimated via Noise-free Prototype Confidence Estimation (NFCE). (b) The reliable depth of Multi-QuAD for $x_i$ is adjusted by Global Confidence Normalized Depth (GCND) based on modality-level quality. (c) The reliable parameters of Multi-QuAD for $x_i$ are adjusted by Layer-wise Greedy Parameter (LGP) driven by feature-level quality. The detailed implementation of $\textbf{LGP}^t$ at each layer will be demonstrated in Fig. \ref{['fig:lgp']}.
  • Figure 3: The detailed implementation of cross-modality greedy parameter prediction at the $t$-th layer ($\textbf{LGP}^t$).
  • Figure 4: Comparison of classification accuracy of different models under Gaussian noise with different intensities $\sigma$.
  • Figure 5: Ablation study to demonstrate the effectiveness of GCND.
  • ...and 6 more figures