Regular Expression Denial of Service Induced by Backreferences

Yichen Liu; Berk Çakar; Aman Agrawal; Minseok Seo; James C. Davis; Dongyoon Lee

Regular Expression Denial of Service Induced by Backreferences

Yichen Liu, Berk Çakar, Aman Agrawal, Minseok Seo, James C. Davis, Dongyoon Lee

Abstract

This paper presents the first systematic study of denial-of-service vulnerabilities in Regular Expressions with Backreferences (REwB). We introduce the Two-Phase Memory Automaton (2PMFA), an automaton model that precisely captures REwB semantics. Using this model, we derive necessary conditions under which backreferences induce super-linear backtracking runtime, even when sink ambiguity is linear -- a regime where existing detectors report no vulnerability. Based on these conditions, we identify three vulnerability patterns, develop detection and attack-construction algorithms, and validate them in practice. Using the Snort intrusion detection ruleset, our evaluation identifies 45 previously unknown REwB vulnerabilities with quadratic or worse runtime. We further demonstrate practical exploits against Snort, including slowing rule evaluation by 0.6-1.2 seconds and bypassing alerts by triggering PCRE's matching limit.

Regular Expression Denial of Service Induced by Backreferences

Abstract

Paper Structure (72 sections, 7 theorems, 34 equations, 8 figures, 2 tables, 4 algorithms)

This paper contains 72 sections, 7 theorems, 34 equations, 8 figures, 2 tables, 4 algorithms.

Introduction
Background
Regular Expressions and Backreferences
Backreferences.
Automata Equivalence and Irregular Constructs.
ReDoS and Regex Complexity
Matching Algorithms
Ambiguity
Sink Automaton
Complexity Characterization
Motivation and Problem Statement
Motivating Example
Threat Model
Attacker capabilities.
Victim environment.
...and 57 more sections

Key Result

Theorem A

For any $\epsilon$-loop-free NFA $A$, its backtracking runtime satisfies $\mathrm{BtRtN}(A,n) \in \mathcal{O}(\mathrm{SinkAbgN}(A,n))$.

Figures (8)

Figure 1: Matching $\lcg_1 \backslash 1 \texttt{b} \mid \texttt{a} \rcg_1^*$ against 'aababb'. The capture table stores the committed value from the prior iteration alongside the substring being captured in the current iteration.
Figure 2: (1) NFA and (2) Sink-NFA of regex /a(b*)c/. (3) NFA and (4) Sink-NFA of regex /a*b*/. (5) NFA and (6) Sink-NFA of regex /(a|a)*/.
Figure 3: Matching time for a regex from the Snort ruleset, evaluated on a benign input and two adversarial inputs exploiting infinite degree of ambiguity (IDA) and a combination of IDA with backreferences.
Figure 4: Sink automaton $\mathrm{Sink}(A)$ for the regex $\lcg_1 \texttt{a}^* \rcg_1 \backslash\!1\;\texttt{b}$. The original automaton $A$ has a single a-loop at $q_1$; no two overlapping loops exist.
Figure 5: Structural conditions for super-linear backreference behavior. (a) A backreference incurs non-$\mathcal{O}(1)$ cost when its capture group contains a loop (left) or another non-$\mathcal{O}(1)$ backreference (right). (b)--(c) A backreference is evaluated non-$\mathcal{O}(1)$ times when it appears after a loop (b) or inside a loop (c).
...and 3 more figures

Theorems & Definitions (19)

definition 1
Theorem A: Backtracking Runtime Bound weideman2016sink
Theorem B: Two-Overlap-Loop Characterization weber1991idaalluazen2008
definition 2
definition 3
proof
lemma 1
proof
lemma 2
proof
...and 9 more

Regular Expression Denial of Service Induced by Backreferences

Abstract

Regular Expression Denial of Service Induced by Backreferences

Authors

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (19)