CERT-ED: Certifiably Robust Text Classification for Edit Distance

Zhuoqun Huang; Neil G Marchant; Olga Ohrimenko; Benjamin I. P. Rubinstein

CERT-ED: Certifiably Robust Text Classification for Edit Distance

Zhuoqun Huang, Neil G Marchant, Olga Ohrimenko, Benjamin I. P. Rubinstein

TL;DR

CERT-ED introduces a multi-class certifiable defense for NLP that extends Randomized Deletion smoothing to provide provable robustness against all edit-distance perturbations within a radius $r$. By deriving a Levenshtein-distance certificate and coupling it with Monte Carlo estimates, CERT-ED achieves larger certified radii and greater certified cardinality than RanMASK across five datasets, while maintaining competitive clean accuracy. The approach yields improved empirical robustness under both direct and transfer attacks, especially for longer text sequences, and demonstrates practical efficiency gains over prior smoothing-based methods. This work broadens the applicability of certified defenses in NLP and offers a scalable framework for provable protection against a broad, realistic set of textual perturbations.

Abstract

With the growing integration of AI in daily life, ensuring the robustness of systems to inference-time attacks is crucial. Among the approaches for certifying robustness to such adversarial examples, randomized smoothing has emerged as highly promising due to its nature as a wrapper around arbitrary black-box models. Previous work on randomized smoothing in natural language processing has primarily focused on specific subsets of edit distance operations, such as synonym substitution or word insertion, without exploring the certification of all edit operations. In this paper, we adapt Randomized Deletion (Huang et al., 2023) and propose, CERTified Edit Distance defense (CERT-ED) for natural language classification. Through comprehensive experiments, we demonstrate that CERT-ED outperforms the existing Hamming distance method RanMASK (Zeng et al., 2023) in 4 out of 5 datasets in terms of both accuracy and the cardinality of the certificate. By covering various threat models, including 5 direct and 5 transfer attacks, our method improves empirical robustness in 38 out of 50 settings.

CERT-ED: Certifiably Robust Text Classification for Edit Distance

TL;DR

CERT-ED introduces a multi-class certifiable defense for NLP that extends Randomized Deletion smoothing to provide provable robustness against all edit-distance perturbations within a radius

. By deriving a Levenshtein-distance certificate and coupling it with Monte Carlo estimates, CERT-ED achieves larger certified radii and greater certified cardinality than RanMASK across five datasets, while maintaining competitive clean accuracy. The approach yields improved empirical robustness under both direct and transfer attacks, especially for longer text sequences, and demonstrates practical efficiency gains over prior smoothing-based methods. This work broadens the applicability of certified defenses in NLP and offers a scalable framework for provable protection against a broad, realistic set of textual perturbations.

Abstract

Paper Structure (47 sections, 6 theorems, 28 equations, 3 figures, 12 tables)

This paper contains 47 sections, 6 theorems, 28 equations, 3 figures, 12 tables.

Introduction
Edit distance robustness
Certified robustness via randomized smoothing
Randomized deletion smoothing
Practicalities
CERT-ED: Multi-class edit distance certification
Experiments
Datasets
Models
Certified accuracy and robustness
Setup
Clean accuracy
Certified accuracy and certified cardinality
Empirical robustness
Attack setup
...and 32 more sections

Key Result

Theorem 1

Consider a pair of text inputs $\bm{x}, \bar{\bm{x}} \in \mathcal{X}$. Suppose $\bar{\bm{x}}$ can be transformed into $\bm{x}$ using a minimal number of edit operations by deleting $n_\mathsf{del}$ tokens, inserting $n_\mathsf{ins}$ tokens and substituting $n_\mathsf{sub}$ tokens---i.e., $\mathop{\m

Figures (3)

Figure 1: Top: Clean sample from Spam-assassin dataset. Middle: CERT-ED applied to the perturbed input to produce edit distance certified prediction of "Spam" and certified radius of $3$. Bottom: Real adversarial sample generated by, Clare li-etal-2021-contextualized, against a model without CERT-ED. The green words are adversarially inserted words. CERT-ED is certifiably robust to this adversarial example as the edit distance between the clean and adversarial inputs is $2$, less than the certified radius.
Figure 2: Certified accuracy for CERT-ED and RanMASK as a function of the log-cardinality of the certificate for the SatNews dataset. We see that CERT-ED certifies a set up to $10^{10}$ times larger than RanMASK for the same accuracy.
Figure 3: Certified accuracy for CERT-ED and RanMASK as a function of log certified cardinality and perturbation strength $p_\mathsf{del}$ and $p_\mathsf{mask}$ (line styles). The certified cardinality is exact for RanMASK but a lower bound is used for CERT-ED. CERT-ED dominates RanMASK in terms of certified accuracy for 3 out of 4 datasets. See Figure \ref{['fig:certified_accuracy-satnews']} for certified accuracy on SatNews.

Theorems & Definitions (13)

Theorem 1: name=General pairwise certificate,restate=pairwisecert
Theorem 2: name=Levenshtein distance certificate,restate=levcert
Lemma 3: huang2023rsdel
Lemma 4
proof
proof
proof
Definition 1
Proposition 5
proof
...and 3 more

CERT-ED: Certifiably Robust Text Classification for Edit Distance

TL;DR

Abstract

CERT-ED: Certifiably Robust Text Classification for Edit Distance

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (13)