Table of Contents
Fetching ...

Regular Expressions with Backreferences and Lookaheads Capture NLOG

Yuya Uezato

TL;DR

The paper establishes a precise characterization of REWBLk, showing it coincides with the complexity class $NLOG$ and that its membership problem is $PSPACE$-complete, thereby placing REGEX with backreferences and lookaheads on a sharp theoretical boundary. Building on prior results that situate REWB within $NLOG$ and $IL$ and that REWB membership is NP-complete, the work extends expressiveness analysis to REWBLk and demonstrates that lookaheads do not add independent REGEX expressiveness, while enabling higher-level constructs. A key technical contribution is translating REWBLk behavior into log-space nondeterministic Turing machines, with particular attention to negative lookaheads; the authors leverage the Immerman–Szelepcsényi theorem and log-space nested-oracle NTMs to address nondeterministic complement operations. The results yield a tight boundary for REWBLk, clarifying its computational complexity and informing the theoretical limits of regular expressions augmented with backreferences and lookaheads.

Abstract

Backreferences and lookaheads are vital features to make classical regular expressions (REGEX) practical. Although these features have been widely used, understanding of the unrestricted combination of them has been limited. Practically, most likely no implementation fully supports them. Theoretically, while some studies have addressed these features separately, few have dared to combine them. In those few studies, it has been made clear that the amalgamation of these features renders REGEX significantly expressive. However, no acceptable expressivity bound for REWBLk$\unicode{x2014}$REGEX with backreferences and lookaheads$\unicode{x2014}$has been established. We elucidate this by establishing that REWBLk coincides with NLOG, the class of languages accepted by log-space nondeterministic Turing machines (NTMs). In translating REWBLk to log-space NTMs, negative lookaheads are the most challenging part since it essentially requires complementing log-space NTMs in nondeterministic log-space. To address this problem, we revisit Immerman$\unicode{x2013}$Szelepcsényi theorem. In addition, we employ log-space nested-oracles NTMs to naturally handle nested lookaheads of REWBLk. Utilizing such oracle machines, we also present the new result that the membership problem of REWBLk is PSPACE-complete.

Regular Expressions with Backreferences and Lookaheads Capture NLOG

TL;DR

The paper establishes a precise characterization of REWBLk, showing it coincides with the complexity class and that its membership problem is -complete, thereby placing REGEX with backreferences and lookaheads on a sharp theoretical boundary. Building on prior results that situate REWB within and and that REWB membership is NP-complete, the work extends expressiveness analysis to REWBLk and demonstrates that lookaheads do not add independent REGEX expressiveness, while enabling higher-level constructs. A key technical contribution is translating REWBLk behavior into log-space nondeterministic Turing machines, with particular attention to negative lookaheads; the authors leverage the Immerman–Szelepcsényi theorem and log-space nested-oracle NTMs to address nondeterministic complement operations. The results yield a tight boundary for REWBLk, clarifying its computational complexity and informing the theoretical limits of regular expressions augmented with backreferences and lookaheads.

Abstract

Backreferences and lookaheads are vital features to make classical regular expressions (REGEX) practical. Although these features have been widely used, understanding of the unrestricted combination of them has been limited. Practically, most likely no implementation fully supports them. Theoretically, while some studies have addressed these features separately, few have dared to combine them. In those few studies, it has been made clear that the amalgamation of these features renders REGEX significantly expressive. However, no acceptable expressivity bound for REWBLkREGEX with backreferences and lookaheadshas been established. We elucidate this by establishing that REWBLk coincides with NLOG, the class of languages accepted by log-space nondeterministic Turing machines (NTMs). In translating REWBLk to log-space NTMs, negative lookaheads are the most challenging part since it essentially requires complementing log-space NTMs in nondeterministic log-space. To address this problem, we revisit ImmermanSzelepcsényi theorem. In addition, we employ log-space nested-oracles NTMs to naturally handle nested lookaheads of REWBLk. Utilizing such oracle machines, we also present the new result that the membership problem of REWBLk is PSPACE-complete.
Paper Structure (2 sections, 2 equations)

This paper contains 2 sections, 2 equations.