Toward Patch Robustness Certification and Detection for Deep Learning Systems Beyond Consistent Samples
Qilin Zhou, Zhengyuan Wei, Haipeng Wang, Zhuo Wang, W. K. Chan
TL;DR
HiCert introduces a masking-based certified detection framework that simultaneously certifies consistent and inconsistent benign samples and guarantees detection of all harmful patched variants. By leveraging a thresholded confidence criterion and a formal relation between inconsistent mutants and their benign counterparts, HiCert extends beyond prior methods that only cover consistent samples. The approach is shown to achieve state-of-the-art certified accuracy and defense success against real adversarial patches across ImageNet, CIFAR100, and GTSRB, while maintaining comparable time complexity. This work broadens the practical applicability of patch robustness Certification and Detection for safety-critical DL systems, enabling more reliable downstream deployment. It also provides a formal foundation (soundness theorems) and extensive empirical analysis, including a breakdown of behavior on inconsistent samples and a study of trade-offs with patch size and detection thresholds.
Abstract
Patch robustness certification is an emerging kind of provable defense technique against adversarial patch attacks for deep learning systems. Certified detection ensures the detection of all patched harmful versions of certified samples, which mitigates the failures of empirical defense techniques that could (easily) be compromised. However, existing certified detection methods are ineffective in certifying samples that are misclassified or whose mutants are inconsistently pre icted to different labels. This paper proposes HiCert, a novel masking-based certified detection technique. By focusing on the problem of mutants predicted with a label different from the true label with our formal analysis, HiCert formulates a novel formal relation between harmful samples generated by identified loopholes and their benign counterparts. By checking the bound of the maximum confidence among these potentially harmful (i.e., inconsistent) mutants of each benign sample, HiCert ensures that each harmful sample either has the minimum confidence among mutants that are predicted the same as the harmful sample itself below this bound, or has at least one mutant predicted with a label different from the harmful sample itself, formulated after two novel insights. As such, HiCert systematically certifies those inconsistent samples and consistent samples to a large extent. To our knowledge, HiCert is the first work capable of providing such a comprehensive patch robustness certification for certified detection. Our experiments show the high effectiveness of HiCert with a new state-of the-art performance: It certifies significantly more benign samples, including those inconsistent and consistent, and achieves significantly higher accuracy on those samples without warnings and a significantly lower false silent ratio.
