Repairing Regex Vulnerabilities via Localization-Guided Instructions

Sicheol Sung; Joonghyuk Hahn; Yo-Sub Han

Repairing Regex Vulnerabilities via Localization-Guided Instructions

Sicheol Sung, Joonghyuk Hahn, Yo-Sub Han

TL;DR

The paper tackles ReDoS vulnerabilities in regexes by introducing Localized Regex Repair (LRR), a hybrid framework that localizes the vulnerability with a symbolic module and then uses an LLM to generate a semantically equivalent repair for the targeted sub-pattern. By decoupling localization from repair, LRR leverages precise pattern localization and the generalization capabilities of LLMs to handle complex cases that rule-based methods miss, achieving a reported improvement of up to 15.40 percentage points in repair rate over prior approaches. The evaluation, conducted on a 1000-regex polyglot dataset and multiple LLMs, reveals a trade-off between syntactic alterations and semantic fidelity, with reasoning LLMs often delivering higher semantic safety at the cost of longer repairs. The work demonstrates a practical, scalable path toward automated, reliable regex repair, while acknowledging limitations in the localization heuristic and the empirical nature of invulnerability validation.”

Abstract

Regular expressions (regexes) are foundational to modern computing for critical tasks like input validation and data parsing, yet their ubiquity exposes systems to regular expression denial of service (ReDoS), a vulnerability requiring automated repair methods. Current approaches, however, are hampered by a trade-off. Symbolic, rule-based system are precise but fails to repair unseen or complex vulnerability patterns. Conversely, large language models (LLMs) possess the necessary generalizability but are unreliable for tasks demanding strict syntactic and semantic correctness. We resolve this impasse by introducing a hybrid framework, localized regex repair (LRR), designed to harness LLM generalization while enforcing reliability. Our core insight is to decouple problem identification from the repair process. First, a deterministic, symbolic module localizes the precise vulnerable subpattern, creating a constrained and tractable problem space. Then, the LLM invoked to generate a semantically equivalent fix for this isolated segment. This combined architecture successfully resolves complex repair cases intractable for rule-based repair while avoiding the semantic errors of LLM-only approaches. Our work provides a validated methodology for solving such problems in automated repair, improving the repair rate by 15.4%p over the state-of-the-art. Our code is available at https://github.com/cdltlehf/LRR.

Repairing Regex Vulnerabilities via Localization-Guided Instructions

TL;DR

Abstract

Repairing Regex Vulnerabilities via Localization-Guided Instructions

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)