RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion

Zhuoqun Huang; Neil G. Marchant; Keane Lucas; Lujo Bauer; Olga Ohrimenko; Benjamin I. P. Rubinstein

RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion

Zhuoqun Huang, Neil G. Marchant, Keane Lucas, Lujo Bauer, Olga Ohrimenko, Benjamin I. P. Rubinstein

TL;DR

This work extends certified robustness to discrete, variable-length sequences by introducing RS-Del, a randomized deletion smoothing mechanism. By grounding certification in an LCS-based analysis rather than Neyman-Pearson, the authors derive edit-distance certificates that cover insertion, deletion, and substitution perturbations. The malware-detection case study demonstrates substantial robustness, achieving a median certified radius of up to $128$ bytes with minimal accuracy loss and asymmetry-enabled radii advantages. The approach is compatible with arbitrary base classifiers, supports training on perturbed data, and can operate on byte- or chunk-level representations, broadening the applicability of certified robustness to discrete domains like executable binaries and source code.

Abstract

Randomized smoothing is a leading approach for constructing classifiers that are certifiably robust against adversarial examples. Existing work on randomized smoothing has focused on classifiers with continuous inputs, such as images, where $\ell_p$-norm bounded adversaries are commonly studied. However, there has been limited work for classifiers with discrete or variable-size inputs, such as for source code, which require different threat models and smoothing mechanisms. In this work, we adapt randomized smoothing for discrete sequence classifiers to provide certified robustness against edit distance-bounded adversaries. Our proposed smoothing mechanism randomized deletion (RS-Del) applies random deletion edits, which are (perhaps surprisingly) sufficient to confer robustness against adversarial deletion, insertion and substitution edits. Our proof of certification deviates from the established Neyman-Pearson approach, which is intractable in our setting, and is instead organized around longest common subsequences. We present a case study on malware detection--a binary classification problem on byte sequences where classifier evasion is a well-established threat model. When applied to the popular MalConv malware detection model, our smoothing mechanism RS-Del achieves a certified accuracy of 91% at an edit distance radius of 128 bytes.

RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion

TL;DR

bytes with minimal accuracy loss and asymmetry-enabled radii advantages. The approach is compatible with arbitrary base classifiers, supports training on perturbed data, and can operate on byte- or chunk-level representations, broadening the applicability of certified robustness to discrete domains like executable binaries and source code.

Abstract

-norm bounded adversaries are commonly studied. However, there has been limited work for classifiers with discrete or variable-size inputs, such as for source code, which require different threat models and smoothing mechanisms. In this work, we adapt randomized smoothing for discrete sequence classifiers to provide certified robustness against edit distance-bounded adversaries. Our proposed smoothing mechanism randomized deletion (RS-Del) applies random deletion edits, which are (perhaps surprisingly) sufficient to confer robustness against adversarial deletion, insertion and substitution edits. Our proof of certification deviates from the established Neyman-Pearson approach, which is intractable in our setting, and is instead organized around longest common subsequences. We present a case study on malware detection--a binary classification problem on byte sequences where classifier evasion is a well-established threat model. When applied to the popular MalConv malware detection model, our smoothing mechanism RS-Del achieves a certified accuracy of 91% at an edit distance radius of 128 bytes.

Paper Structure (77 sections, 7 theorems, 44 equations, 7 figures, 12 tables, 1 algorithm)

This paper contains 77 sections, 7 theorems, 44 equations, 7 figures, 12 tables, 1 algorithm.

Introduction
Preliminaries
Sequence classification
Robustness certification
Edit distance robustness
Threat model
RS-Del: Randomized deletion smoothing
Randomized smoothing
Randomized deletion mechanism
Practical considerations
Probabilistic certification
Training
Sequence chunking
Edit distance robustness certificate
Derivation outline
...and 62 more sections

Key Result

Proposition 3

A sufficient condition for eqn:rsdel-cert is $\rho(\bm{x}, \mu_y) \geq \nu_y(\bm{\eta})$ where is a tight lower bound on the confidence for class $y$, and we define the threshold

Figures (7)

Figure 1: Probabilistic certification of $\mathsf{RS\text{-}Del}$. Here $\bm{x}$ is the input sequence, ${f}_{\mathrm{b}}$ is the base classifier, $p_\mathsf{del}$ is the deletion probability, $\bm{\eta}$ is the set of decision thresholds, $\alpha$ is the significance level, and $n_\mathrm{pred}, n_\mathrm{bnd}$ are sample sizes. $\texttt{BinLCB}(k,n,\alpha)$ returns a lower confidence bound for $p$ at level $\alpha$ given $k \sim \operatorname*{Bin}(n, p)$.
Figure 2: Clean accuracy and robustness metrics for $\mathsf{RS\text{-}Del}$ as a function of dataset and deletion probability $p_\mathsf{del}$. All metrics are computed on the test set. "Median CR" is the median certified Levenshtein distance radius in bytes and "median NCR %" is the median certified Levenshtein distance radius normalized as a percentage of the file size. A good tradeoff is achieved when $p_\mathsf{del} = 99.5\%$ (in bold).
Figure 3: Illustration of the deletion smoothing mechanism applied to an executable file at the byte-level versus chunk-level. Left: An executable file where the elementary byte sequence representation is shown in the 2nd column and chunks that correspond to machine instructions are shown in the 3rd column (sourced from the Ghidra ghidra disassembler). Bytes that do not correspond to machine instructions are marked NI. Shading represents bytes (light gray) or instruction chunks (dark gray) that are deleted in the corresponding perturbed file to the right. Middle: A perturbed file produced by the deletion mechanism operating at the byte level (Byte). Notice that individual instructions may be partially deleted. Right: A perturbed file produced by the deletion mechanism operating at the chunk-level (Insn).
Figure 4: Certified accuracy for $\mathsf{RS\text{-}Del}$ as a function of the radius in bytes (left horizontal axis), radius normalized by file size (right horizontal axis) and byte deletion probability $p_\mathsf{del}$ (line styles). The results are plotted for the Sleipnir2 test set under the byte-level Levenshtein distance threat model (with $O = \{\mathsf{del}, \mathsf{ins}, \mathsf{sub}\}$) . The grey vertical lines in the left plot represent the best achievable certified radius for $\mathsf{RS\text{-}Del}$ (setting $\mu_y = 1$ in the expressions in Table 1).
Figure 5: Certified accuracy for $\mathsf{RS\text{-}Del}$ with chunk-level deletion (Insn) as a function of the radius in chunks (left horizontal axis), radius normalized by sequence length in chunks (right horizontal axis) and chunk deletion probability $p_\mathsf{del}$ (line styles). The results are plotted for the Sleipnir2 test set under the chunk-level Levenshtein distance threat model (with $O = \{\mathsf{del}, \mathsf{ins}, \mathsf{sub}\}$). The grey vertical lines in the left plot represent the best achievable certified radius for $\mathsf{RS\text{-}Del}$ (setting $\mu_y = 1$ in the expressions in Table 1).
...and 2 more figures

Theorems & Definitions (13)

Remark 1
Remark 2
Proposition 3
Lemma 4: Equivalent edits
Theorem 5
Corollary 6
Theorem 7: Levenshtein distance certificate
Corollary 8
Remark 9
Remark 10
...and 3 more

RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion

TL;DR

Abstract

RS-Del: Edit Distance Robustness Certificates for Sequence Classifiers via Randomized Deletion

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (13)