Multi-QuAD: Multi-Level Quality-Adaptive Dynamic Network for Reliable Multimodal Classification
Shu Shen, C. L. Philip Chen, Tong Zhang
TL;DR
Multi-QuAD addresses reliability in multimodal classification under varying data quality by introducing NFCE, which estimates sample quality without classifiers using noise-free prototypes, and two dynamic mechanisms—GCND for sample-specific depth and LGP for cross-modality parameter prediction. By normalizing depth across modalities and enabling sample-adaptive parameters, Multi-QuAD achieves higher accuracy and robustness across four diverse datasets, especially under Gaussian noise, while maintaining computational efficiency. The approach offers reliable, multi-level quality assessment and dynamic adaptation, advancing trustworthy inference in multimodal systems. This framework has practical implications for safety-critical applications where data quality is unpredictable and dependably calibrated performance is essential.
Abstract
Multimodal machine learning has achieved remarkable progress in many scenarios, but its reliability is undermined by varying sample quality. This paper finds that existing reliable multimodal classification methods not only fail to provide robust estimation of data quality, but also lack dynamic networks for sample-specific depth and parameters to achieve reliable inference. To this end, a novel framework for multimodal reliable classification termed \textit{Multi-level Quality-Adaptive Dynamic multimodal network} (Multi-QuAD) is proposed. Multi-QuAD first adopts a novel approach based on noise-free prototypes and a classifier-free design to reliably estimate the quality of each sample at both modality and feature levels. It then achieves sample-specific network depth via the \textbf{\textit{Global Confidence Normalized Depth (GCND)}} mechanism. By normalizing depth across modalities and samples, \textit{\textbf{GCND}} effectively mitigates the impact of challenging modality inputs on dynamic depth reliability. Furthermore, Multi-QuAD provides sample-adaptive network parameters via the \textbf{\textit{Layer-wise Greedy Parameter (LGP)}} mechanism driven by feature-level quality. The cross-modality layer-wise greedy strategy in \textbf{\textit{LGP}} designs a reliable parameter prediction paradigm for multimodal networks with variable architecture for the first time. Experiments conducted on four datasets demonstrate that Multi-QuAD significantly outperforms state-of-the-art methods in classification performance and reliability, exhibiting strong adaptability to data with diverse quality.
