Table of Contents
Fetching ...

Regular Expression Denial of Service Induced by Backreferences

Yichen Liu, Berk Çakar, Aman Agrawal, Minseok Seo, James C. Davis, Dongyoon Lee

Abstract

This paper presents the first systematic study of denial-of-service vulnerabilities in Regular Expressions with Backreferences (REwB). We introduce the Two-Phase Memory Automaton (2PMFA), an automaton model that precisely captures REwB semantics. Using this model, we derive necessary conditions under which backreferences induce super-linear backtracking runtime, even when sink ambiguity is linear -- a regime where existing detectors report no vulnerability. Based on these conditions, we identify three vulnerability patterns, develop detection and attack-construction algorithms, and validate them in practice. Using the Snort intrusion detection ruleset, our evaluation identifies 45 previously unknown REwB vulnerabilities with quadratic or worse runtime. We further demonstrate practical exploits against Snort, including slowing rule evaluation by 0.6-1.2 seconds and bypassing alerts by triggering PCRE's matching limit.

Regular Expression Denial of Service Induced by Backreferences

Abstract

This paper presents the first systematic study of denial-of-service vulnerabilities in Regular Expressions with Backreferences (REwB). We introduce the Two-Phase Memory Automaton (2PMFA), an automaton model that precisely captures REwB semantics. Using this model, we derive necessary conditions under which backreferences induce super-linear backtracking runtime, even when sink ambiguity is linear -- a regime where existing detectors report no vulnerability. Based on these conditions, we identify three vulnerability patterns, develop detection and attack-construction algorithms, and validate them in practice. Using the Snort intrusion detection ruleset, our evaluation identifies 45 previously unknown REwB vulnerabilities with quadratic or worse runtime. We further demonstrate practical exploits against Snort, including slowing rule evaluation by 0.6-1.2 seconds and bypassing alerts by triggering PCRE's matching limit.
Paper Structure (72 sections, 7 theorems, 34 equations, 8 figures, 2 tables, 4 algorithms)

This paper contains 72 sections, 7 theorems, 34 equations, 8 figures, 2 tables, 4 algorithms.

Key Result

Theorem A

For any $\epsilon$-loop-free NFA $A$, its backtracking runtime satisfies $\mathrm{BtRtN}(A,n) \in \mathcal{O}(\mathrm{SinkAbgN}(A,n))$.

Figures (8)

  • Figure 1: Matching $\lcg_1 \backslash 1 \texttt{b} \mid \texttt{a} \rcg_1^*$ against 'aababb'. The capture table stores the committed value from the prior iteration alongside the substring being captured in the current iteration.
  • Figure 2: (1) NFA and (2) Sink-NFA of regex /a(b*)c/. (3) NFA and (4) Sink-NFA of regex /a*b*/. (5) NFA and (6) Sink-NFA of regex /(a|a)*/.
  • Figure 3: Matching time for a regex from the Snort ruleset, evaluated on a benign input and two adversarial inputs exploiting infinite degree of ambiguity (IDA) and a combination of IDA with backreferences.
  • Figure 4: Sink automaton $\mathrm{Sink}(A)$ for the regex $\lcg_1 \texttt{a}^* \rcg_1 \backslash\!1\;\texttt{b}$. The original automaton $A$ has a single a-loop at $q_1$; no two overlapping loops exist.
  • Figure 5: Structural conditions for super-linear backreference behavior. (a) A backreference incurs non-$\mathcal{O}(1)$ cost when its capture group contains a loop (left) or another non-$\mathcal{O}(1)$ backreference (right). (b)--(c) A backreference is evaluated non-$\mathcal{O}(1)$ times when it appears after a loop (b) or inside a loop (c).
  • ...and 3 more figures

Theorems & Definitions (19)

  • definition 1
  • Theorem A: Backtracking Runtime Bound weideman2016sink
  • Theorem B: Two-Overlap-Loop Characterization weber1991idaalluazen2008
  • definition 2
  • definition 3
  • proof
  • lemma 1
  • proof
  • lemma 2
  • proof
  • ...and 9 more