CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

Qilin Zhou; Zhengyuan Wei; Haipeng Wang; Bo Jiang; W. K. Chan

CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Bo Jiang, W. K. Chan

TL;DR

This work tackles patch robustness certification for deep learning by introducing CrossCert, a cross-checking framework that combines two certified recovery defenders to achieve both unwavering certification and detection certification. By pairing a masking-based recovery defender with a voting-based one, CrossCert derives new certainty guarantees, including that benign labels persist under patch perturbations and that malicious variants either trigger warnings or fail to align across the two defenses. The authors provide formal definitions (unwavering certification and detection certification) and theoretical guarantees (consistency and intersection theorems) along with a revised PatchCleanser-based base and analytical constructs ATT_R1/ATT_R2 to support the guarantees. Empirically, CrossCert demonstrates substantial certifiable unwavering accuracy and competitive certifiable detection accuracy on ImageNet, CIFAR-10, and CIFAR-100 relative to state-of-the-art detectors, with ViT backbones generally delivering stronger performance. This cross-checking strategy paves the way for more automated and robust defenses against adversarial patches, and suggests fruitful directions for integrating recovery and detection semantics in certification frameworks.

Abstract

Patch robustness certification is an emerging kind of defense technique against adversarial patch attacks with provable guarantees. There are two research lines: certified recovery and certified detection. They aim to label malicious samples with provable guarantees correctly and issue warnings for malicious samples predicted to non-benign labels with provable guarantees, respectively. However, existing certified detection defenders suffer from protecting labels subject to manipulation, and existing certified recovery defenders cannot systematically warn samples about their labels. A certified defense that simultaneously offers robust labels and systematic warning protection against patch attacks is desirable. This paper proposes a novel certified defense technique called CrossCert. CrossCert formulates a novel approach by cross-checking two certified recovery defenders to provide unwavering certification and detection certification. Unwavering certification ensures that a certified sample, when subjected to a patched perturbation, will always be returned with a benign label without triggering any warnings with a provable guarantee. To our knowledge, CrossCert is the first certified detection technique to offer this guarantee. Our experiments show that, with a slightly lower performance than ViP and comparable performance with PatchCensor in terms of detection certification, CrossCert certifies a significant proportion of samples with the guarantee of unwavering certification.

CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

TL;DR

Abstract

Paper Structure (36 sections, 7 theorems, 3 figures, 4 tables, 2 algorithms)

This paper contains 36 sections, 7 theorems, 3 figures, 4 tables, 2 algorithms.

Introduction
Preliminaries
Classification Model and Patch Attacker
Certified Defense against Patch Attacks: Detection and Recovery
Certified Detection
Certified Recovery
Voting-Based Recovery, and Masking-Based Detection and Recovery
Voting-Based Recovery
Masking-Based Detection and Recovery
Critical Limitation
CrossCert
Overview
Unwavering Certification
Design of the Base Framework: CrossCert-base
CrossCert Framework Design
...and 21 more sections

Key Result

Theorem 1

Given a sample x and two recovery defenders $R_1=\langle g_1, v_1, c_1\rangle$ and $R_2=\langle g_2, v_2, c_2\rangle$. If $g_1(\textit{x})=g_2(\textit{x})\land c_1(\textit{x})=c_2(\textit{x})=\textit{True}$, then the condition $[\forall \textit{x}'\in \mathbb{A}(\textit{x}), g_1(\textit{x}')=g_1(\t

Figures (3)

Figure 1: Examples for a voting-based recovery defender (left) and a masking-based recovery defender (right). Left: Given a benign sample x, the voting-based recovery defender performs a round of ablating on x to generate ablated mutants. Suppose $\Delta=2$. As illustrated in the lower section, for a benign sample, $f$ predicts Panda for 5 mutants and Cat for none. Thus, we have $5-0 > 2\Delta$, and the sample is predicted as Panda. Then, if there is any patch attached to this benign sample (as illustrated in the upper section, for example), at most two mutants would be affected to change the prediction from Panda to Cat in the worst case, resulting in 3 votes for Panda and 2 for Cat. Still, we have $3 > 2$, and the malicious sample is predicted as Panda. Right: Given a benign sample $\textit{x}$, a masking-based recovery defender applies two rounds of masking on $\textit{x}$ to generate mutants. Suppose the prediction labels of all these mutants reach a consensus (all are Panda, as illustrated in the lower section). Then, if there is any patch (which must be masked by at least one mask) overlapping with $\textit{x}$, Alg. \ref{['alg:patchcleanser_original']} guarantees to output Panda for any resulting malicious sample. The upper section illustrates Case II in Alg. \ref{['alg:patchcleanser_original']}. In the first-round masking, mutants may have different prediction labels like Panda, Cat, and Dog. Alg. \ref{['alg:patchcleanser_original']} continues to perform second-round masking on first-round masked mutants with non-majority labels. In Alg. \ref{['alg:patchcleanser_original']}, if all mutants of any first-round masked mutant reach a consensus on their predicted labels, then the label is output, which is the case for the first-round masked mutant in the top-right corner, where $f$ predicts Panda for all its second-round masked mutants. Thus, Panda is output.
Figure 2: Overview of CrossCert. CrossCert adopts a masking-based recovery defender $R_1 = \langle g_1, v_1, c_1\rangle$ and a voting-based recovery defender $R_2 = \langle g_2, v_2, c_2\rangle$. If the respective conditions are met, x is certifiably detectable and certifiably unwavering, respectively. A certifiably unwavering sample is also a certifiably detectable sample, proven by Property \ref{['property-cu-imply-cr']}. For a certifiably detectable sample x, CrossCert guarantees that it must issue a warning for any malicious version $\textit{x}'$ around x in the warning verification analysis if the recovered labels of x and $\textit{x}'$ differ, proven by Thm. \ref{['thm: intersection']}. For a certifiably unwavering sample $\textit{x}$, CrossCert guarantees that no label change can occur for any malicious samples around $\textit{x}$ and will not raise any warning in the warning verification analysis, proven by Thm. \ref{['thm:consistency']}.
Figure : PatchCleanser's Prediction

Theorems & Definitions (10)

Definition 1: Certified Detection
Definition 2: certified Recovery
Definition 3: Unwavering Certification
Theorem 1: A Condition for Unwavering Certification
Theorem 2: A Condition for Detection Certification
Lemma 1: Recovery Certification of PatchCleanser with Revised Prediction
Lemma 2: Necessary Attack Condition for Masking-based Recovery
Lemma 3: Necessary Attack Condition of Voting-based Recovery
Theorem 3: Unwavering Certification-Warning Verification Consistency
Theorem 4: Detection Certification-Warning Verification consistency

CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

TL;DR

Abstract

CrossCert: A Cross-Checking Detection Approach to Patch Robustness Certification for Deep Learning Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (10)