MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models
Yanting Wang, Hongye Fu, Wei Zou, Jinyuan Jia
TL;DR
This work tackles the vulnerability of multi-modal models to adversarial perturbations across modalities by introducing MMCert, the first certified defense for multi-modal inputs. MMCert constructs an ensemble through independent sub-sampling of basic elements from each modality and provides provable robustness guarantees under per-modality $l_0$-like attacks using Neyman–Pearson certification, Monte Carlo estimation, and Clopper–Pearson bounds. It extends certification to both classification and segmentation tasks, employing Holm–Bonferroni to maximize the number of certifiable elements in segmentation. Empirical evaluation on the RAVDESS emotion recognition and KITTI Road segmentation benchmarks shows that MMCert substantially improves certified robustness compared to a state-of-the-art unimodal baseline extended to multi-modal inputs, demonstrating practical impact for safety-critical systems requiring provable guarantees.
Abstract
Different from a unimodal model whose input is from a single modality, the input (called multi-modal input) of a multi-modal model is from multiple modalities such as image, 3D points, audio, text, etc. Similar to unimodal models, many existing studies show that a multi-modal model is also vulnerable to adversarial perturbation, where an attacker could add small perturbation to all modalities of a multi-modal input such that the multi-modal model makes incorrect predictions for it. Existing certified defenses are mostly designed for unimodal models, which achieve sub-optimal certified robustness guarantees when extended to multi-modal models as shown in our experimental results. In our work, we propose MMCert, the first certified defense against adversarial attacks to a multi-modal model. We derive a lower bound on the performance of our MMCert under arbitrary adversarial attacks with bounded perturbations to both modalities (e.g., in the context of auto-driving, we bound the number of changed pixels in both RGB image and depth image). We evaluate our MMCert using two benchmark datasets: one for the multi-modal road segmentation task and the other for the multi-modal emotion recognition task. Moreover, we compare our MMCert with a state-of-the-art certified defense extended from unimodal models. Our experimental results show that our MMCert outperforms the baseline.
