Table of Contents
Fetching ...

On the Hardness of Learning Regular Expressions

Idan Attias, Lev Reyzin, Nathan Srebro, Gal Vardi

TL;DR

This paper addresses the computational hardness of learning regular expressions (REs) by establishing PAC and MQ hardness under both distribution-free and uniform distributions, including extended REs with complement and intersection. The authors leverage reductions from DNFs to REs for distribution-free hardness and develop a local PRG framework to prove hardness under the uniform distribution, showing that even weak learning of REs of size $n^{\epsilon}$ is intractable. They also translate Boolean formulas into extended REs to transfer hardness results to $RE(\cap)$ and $RE(\neg)$ under the uniform distribution, under cryptographic assumptions. A key message is that learnability is governed by the chosen description language (RE length) rather than the regular-language class itself, underscoring the importance of representation when assessing tractability and suggesting several open questions for plain REs and NFAs under the uniform distribution.

Abstract

Despite the theoretical significance and wide practical use of regular expressions, the computational complexity of learning them has been largely unexplored. We study the computational hardness of improperly learning regular expressions in the PAC model and with membership queries. We show that PAC learning is hard even under the uniform distribution on the hypercube, and also prove hardness of distribution-free learning with membership queries. Furthermore, if regular expressions are extended with complement or intersection, we establish hardness of learning with membership queries even under the uniform distribution. We emphasize that these results do not follow from existing hardness results for learning DFAs or NFAs, since the descriptive complexity of regular languages can differ exponentially between DFAs, NFAs, and regular expressions.

On the Hardness of Learning Regular Expressions

TL;DR

This paper addresses the computational hardness of learning regular expressions (REs) by establishing PAC and MQ hardness under both distribution-free and uniform distributions, including extended REs with complement and intersection. The authors leverage reductions from DNFs to REs for distribution-free hardness and develop a local PRG framework to prove hardness under the uniform distribution, showing that even weak learning of REs of size is intractable. They also translate Boolean formulas into extended REs to transfer hardness results to and under the uniform distribution, under cryptographic assumptions. A key message is that learnability is governed by the chosen description language (RE length) rather than the regular-language class itself, underscoring the importance of representation when assessing tractability and suggesting several open questions for plain REs and NFAs under the uniform distribution.

Abstract

Despite the theoretical significance and wide practical use of regular expressions, the computational complexity of learning them has been largely unexplored. We study the computational hardness of improperly learning regular expressions in the PAC model and with membership queries. We show that PAC learning is hard even under the uniform distribution on the hypercube, and also prove hardness of distribution-free learning with membership queries. Furthermore, if regular expressions are extended with complement or intersection, we establish hardness of learning with membership queries even under the uniform distribution. We emphasize that these results do not follow from existing hardness results for learning DFAs or NFAs, since the descriptive complexity of regular languages can differ exponentially between DFAs, NFAs, and regular expressions.

Paper Structure

This paper contains 29 sections, 6 theorems, 18 equations, 2 tables.

Key Result

Lemma 4.1

Let $\Phi:\{0,1\}^n\to\{0,1\}$ be a DNF with $m(n)$ terms. There is a plain regular expression $R_\Phi$ over $\{0,1\}$ of size $O(mn)$ such that for every $\boldsymbol{x}\in\{0,1\}^n$, $\boldsymbol{x} \in L(R_\Phi) \leftrightarrow \Phi(\boldsymbol{x})=1$.

Theorems & Definitions (11)

  • Lemma 4.1: Polynomial-Size DNF $\to$ RE Translation
  • Proof
  • theorem 4.2: Distribution-Free Hardness of PAC Learning REs
  • Proof
  • theorem 4.4: Hardness of PAC Learning REs under the Uniform Distribution
  • theorem 5.1: Distribution-Free Hardness of PAC+MQ Learning REs
  • Lemma 5.2: Polynomial-Size Boolean Formula $\to$ $\mathsf{RE}(\neg)$ / $\mathsf{RE}(\cap)$ Translation
  • Proof
  • theorem 5.3: Hardness of PAC+MQ Learning $\mathsf{RE}(\cap)$ or $\mathsf{RE}(\neg)$ Under the Uniform Distribution
  • Proof
  • ...and 1 more