Table of Contents
Fetching ...

Toward Patch Robustness Certification and Detection for Deep Learning Systems Beyond Consistent Samples

Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Zhuo Wang, W. K. Chan

TL;DR

HiCert introduces a masking-based certified detection framework that simultaneously certifies consistent and inconsistent benign samples and guarantees detection of all harmful patched variants. By leveraging a thresholded confidence criterion and a formal relation between inconsistent mutants and their benign counterparts, HiCert extends beyond prior methods that only cover consistent samples. The approach is shown to achieve state-of-the-art certified accuracy and defense success against real adversarial patches across ImageNet, CIFAR100, and GTSRB, while maintaining comparable time complexity. This work broadens the practical applicability of patch robustness Certification and Detection for safety-critical DL systems, enabling more reliable downstream deployment. It also provides a formal foundation (soundness theorems) and extensive empirical analysis, including a breakdown of behavior on inconsistent samples and a study of trade-offs with patch size and detection thresholds.

Abstract

Patch robustness certification is an emerging kind of provable defense technique against adversarial patch attacks for deep learning systems. Certified detection ensures the detection of all patched harmful versions of certified samples, which mitigates the failures of empirical defense techniques that could (easily) be compromised. However, existing certified detection methods are ineffective in certifying samples that are misclassified or whose mutants are inconsistently pre icted to different labels. This paper proposes HiCert, a novel masking-based certified detection technique. By focusing on the problem of mutants predicted with a label different from the true label with our formal analysis, HiCert formulates a novel formal relation between harmful samples generated by identified loopholes and their benign counterparts. By checking the bound of the maximum confidence among these potentially harmful (i.e., inconsistent) mutants of each benign sample, HiCert ensures that each harmful sample either has the minimum confidence among mutants that are predicted the same as the harmful sample itself below this bound, or has at least one mutant predicted with a label different from the harmful sample itself, formulated after two novel insights. As such, HiCert systematically certifies those inconsistent samples and consistent samples to a large extent. To our knowledge, HiCert is the first work capable of providing such a comprehensive patch robustness certification for certified detection. Our experiments show the high effectiveness of HiCert with a new state-of the-art performance: It certifies significantly more benign samples, including those inconsistent and consistent, and achieves significantly higher accuracy on those samples without warnings and a significantly lower false silent ratio.

Toward Patch Robustness Certification and Detection for Deep Learning Systems Beyond Consistent Samples

TL;DR

HiCert introduces a masking-based certified detection framework that simultaneously certifies consistent and inconsistent benign samples and guarantees detection of all harmful patched variants. By leveraging a thresholded confidence criterion and a formal relation between inconsistent mutants and their benign counterparts, HiCert extends beyond prior methods that only cover consistent samples. The approach is shown to achieve state-of-the-art certified accuracy and defense success against real adversarial patches across ImageNet, CIFAR100, and GTSRB, while maintaining comparable time complexity. This work broadens the practical applicability of patch robustness Certification and Detection for safety-critical DL systems, enabling more reliable downstream deployment. It also provides a formal foundation (soundness theorems) and extensive empirical analysis, including a breakdown of behavior on inconsistent samples and a study of trade-offs with patch size and detection thresholds.

Abstract

Patch robustness certification is an emerging kind of provable defense technique against adversarial patch attacks for deep learning systems. Certified detection ensures the detection of all patched harmful versions of certified samples, which mitigates the failures of empirical defense techniques that could (easily) be compromised. However, existing certified detection methods are ineffective in certifying samples that are misclassified or whose mutants are inconsistently pre icted to different labels. This paper proposes HiCert, a novel masking-based certified detection technique. By focusing on the problem of mutants predicted with a label different from the true label with our formal analysis, HiCert formulates a novel formal relation between harmful samples generated by identified loopholes and their benign counterparts. By checking the bound of the maximum confidence among these potentially harmful (i.e., inconsistent) mutants of each benign sample, HiCert ensures that each harmful sample either has the minimum confidence among mutants that are predicted the same as the harmful sample itself below this bound, or has at least one mutant predicted with a label different from the harmful sample itself, formulated after two novel insights. As such, HiCert systematically certifies those inconsistent samples and consistent samples to a large extent. To our knowledge, HiCert is the first work capable of providing such a comprehensive patch robustness certification for certified detection. Our experiments show the high effectiveness of HiCert with a new state-of the-art performance: It certifies significantly more benign samples, including those inconsistent and consistent, and achieves significantly higher accuracy on those samples without warnings and a significantly lower false silent ratio.

Paper Structure

This paper contains 56 sections, 4 theorems, 15 figures, 7 tables.

Key Result

Theorem 1

If the patch region is covered by a mask whose corresponding mutant's label is the same as the true label, it is infeasible for harmful samples to show no label difference. (i.e., if the condition $[\exists \textsc{m}_\textsc{p} \in\mathbb{M}_\mathbb{P}, \textsc{m}_\textsc{p}\odot{\textsc{p}}=\texts

Figures (15)

  • Figure 1: One possible patch attack scenario targeting traffic sign recognition systems hussain2024evaluating, further threatening the reliability of autonomous driving systems.
  • Figure 2: A patch attack detection framework with certified detection.
  • Figure 3: HiCert and its three purposes achieved by a unified relation. The unified relation is presented as Thm. \ref{['thm:Inconsistent-Max-Min']}.
  • Figure 4: Illustration of the concepts of masking. A military aircraft (from ImageNet deng2009imagenet) may be patched to evade DL-based inspection.
  • Figure 5: Illustration of the $D_\text{OMA}$ defender.
  • ...and 10 more figures

Theorems & Definitions (8)

  • Definition 1: Certified Detection
  • Definition 2: $D_\text{OMA}$ condition
  • Theorem 1: Consistent mutants are infeasible places for attackers
  • Theorem 2: HiCert Certification
  • Theorem : Consistent mutants are infeasible places for attackers
  • proof
  • Theorem : HiCert Certification
  • proof