MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

Yanting Wang; Hongye Fu; Wei Zou; Jinyuan Jia

MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

Yanting Wang, Hongye Fu, Wei Zou, Jinyuan Jia

TL;DR

This work tackles the vulnerability of multi-modal models to adversarial perturbations across modalities by introducing MMCert, the first certified defense for multi-modal inputs. MMCert constructs an ensemble through independent sub-sampling of basic elements from each modality and provides provable robustness guarantees under per-modality $l_0$-like attacks using Neyman–Pearson certification, Monte Carlo estimation, and Clopper–Pearson bounds. It extends certification to both classification and segmentation tasks, employing Holm–Bonferroni to maximize the number of certifiable elements in segmentation. Empirical evaluation on the RAVDESS emotion recognition and KITTI Road segmentation benchmarks shows that MMCert substantially improves certified robustness compared to a state-of-the-art unimodal baseline extended to multi-modal inputs, demonstrating practical impact for safety-critical systems requiring provable guarantees.

Abstract

Different from a unimodal model whose input is from a single modality, the input (called multi-modal input) of a multi-modal model is from multiple modalities such as image, 3D points, audio, text, etc. Similar to unimodal models, many existing studies show that a multi-modal model is also vulnerable to adversarial perturbation, where an attacker could add small perturbation to all modalities of a multi-modal input such that the multi-modal model makes incorrect predictions for it. Existing certified defenses are mostly designed for unimodal models, which achieve sub-optimal certified robustness guarantees when extended to multi-modal models as shown in our experimental results. In our work, we propose MMCert, the first certified defense against adversarial attacks to a multi-modal model. We derive a lower bound on the performance of our MMCert under arbitrary adversarial attacks with bounded perturbations to both modalities (e.g., in the context of auto-driving, we bound the number of changed pixels in both RGB image and depth image). We evaluate our MMCert using two benchmark datasets: one for the multi-modal road segmentation task and the other for the multi-modal emotion recognition task. Moreover, we compare our MMCert with a state-of-the-art certified defense extended from unimodal models. Our experimental results show that our MMCert outperforms the baseline.

MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

TL;DR

-like attacks using Neyman–Pearson certification, Monte Carlo estimation, and Clopper–Pearson bounds. It extends certification to both classification and segmentation tasks, employing Holm–Bonferroni to maximize the number of certifiable elements in segmentation. Empirical evaluation on the RAVDESS emotion recognition and KITTI Road segmentation benchmarks shows that MMCert substantially improves certified robustness compared to a state-of-the-art unimodal baseline extended to multi-modal inputs, demonstrating practical impact for safety-critical systems requiring provable guarantees.

Abstract

Paper Structure (20 sections, 2 theorems, 19 equations, 9 figures)

This paper contains 20 sections, 2 theorems, 19 equations, 9 figures.

Introduction
Background and Related Work
Adversarial Attacks to Multi-modal Models
Existing Defenses
Problem Formulation
Threat Model
Certifiably Robust Multi-modal Prediction
Our Design
Independent Sub-sampling
Certify Multi-modal Classification
Certify Multi-modal Segmentation
Evaluation
Experimental Setup
Experimental Results
Conclusion
...and 5 more sections

Key Result

Theorem 1

Suppose we have a multi-modal test input $\mathbf{M}$ and a base multi-modal classifier $g$. Our ensemble classifier $G$ is as defined as above. We denote $A = G(\mathbf{M})$ and use $\underline{p_A}$ to denote the label probability lower bound for the label $A$. We use $B$ to denote the runner-up c if: where $e_i=n_i-r_i$ and $n'_i=n_i$ for modification attack; $e_i=n_i$ and $n'_i=n_i+r_i$ for a

Figures (9)

Figure 1: Compare our MMCert with randomized ablation on RAVDESS Dataset.
Figure 2: Compare our MMCert with randomized ablation on KITTI Road Dataset. Certified Pixel Accuracy (first row), Certified F-score (second row) and Certified IoU (third row) are considered.
Figure 3: Compare different attack types on RAVDESS Dataset.
Figure 4: Impact of the ratio between $k_1$ and $k_2$. Certified Pixel Accuracy (first row), Certified F-score (second row) and Certified IoU (third row) are considered.
Figure 5: Compare our MMCert with randomized ablation on RAVDESS Dataset.
...and 4 more figures

Theorems & Definitions (4)

Theorem 1: Certification for classification
proof
Lemma 1: Neyman Pearson
proof

MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

TL;DR

Abstract

MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (4)