CERT-ED: Certifiably Robust Text Classification for Edit Distance
Zhuoqun Huang, Neil G Marchant, Olga Ohrimenko, Benjamin I. P. Rubinstein
TL;DR
CERT-ED introduces a multi-class certifiable defense for NLP that extends Randomized Deletion smoothing to provide provable robustness against all edit-distance perturbations within a radius $r$. By deriving a Levenshtein-distance certificate and coupling it with Monte Carlo estimates, CERT-ED achieves larger certified radii and greater certified cardinality than RanMASK across five datasets, while maintaining competitive clean accuracy. The approach yields improved empirical robustness under both direct and transfer attacks, especially for longer text sequences, and demonstrates practical efficiency gains over prior smoothing-based methods. This work broadens the applicability of certified defenses in NLP and offers a scalable framework for provable protection against a broad, realistic set of textual perturbations.
Abstract
With the growing integration of AI in daily life, ensuring the robustness of systems to inference-time attacks is crucial. Among the approaches for certifying robustness to such adversarial examples, randomized smoothing has emerged as highly promising due to its nature as a wrapper around arbitrary black-box models. Previous work on randomized smoothing in natural language processing has primarily focused on specific subsets of edit distance operations, such as synonym substitution or word insertion, without exploring the certification of all edit operations. In this paper, we adapt Randomized Deletion (Huang et al., 2023) and propose, CERTified Edit Distance defense (CERT-ED) for natural language classification. Through comprehensive experiments, we demonstrate that CERT-ED outperforms the existing Hamming distance method RanMASK (Zeng et al., 2023) in 4 out of 5 datasets in terms of both accuracy and the cardinality of the certificate. By covering various threat models, including 5 direct and 5 transfer attacks, our method improves empirical robustness in 38 out of 50 settings.
