Subsequence Matching and Analysis Problems for Formal Languages
Szilárd Zsolt Fazekas, Tore Koß, Florin Manea, Robert Mercaş, Timo Specht
TL;DR
This work studies subsequence-matching and analysis problems for languages given by grammars or automata, extending classical subsequence questions from strings to formal languages. It develops a general decidability framework with two conditions (H1 and H2) that yield decidability for CFLs, while proving undecidability for CSLs, and it provides efficient CFL algorithms for five subsequence-related problems. The results establish REG-like polynomial-time solvability for certain problems and an FPT algorithm in the alphabet size for exist_universal_largerthan_k, along with a polynomial-time method to decide universal-for-all_m and to compute iota_∀; it also introduces deterministic finite automata with translucent letters as an intermediate class with initial decidability/undecidability insights. Finally, it discusses the boundary between decidable and undecidable cases and outlines future work toward intermediate language classes and downward-closure computations for TFAs, aiming to further map the Chomsky-hierarchy landscape in the context of subsequence analysis.
Abstract
In this paper, we study a series of algorithmic problems related to the subsequences occurring in the strings of a given language, under the assumption that this language is succinctly represented by a grammar generating it, or an automaton accepting it. In particular, we focus on the following problems: Given a string $w$ and a language $L$, does there exist a word of $L$ which has $w$ as subsequence? Do all words of $L$ have $w$ as a subsequence? Given an integer $k$ alongside $L$, does there exist a word of $L$ which has all strings of length $k$, over the alphabet of $L$, as subsequences? Do all words of $L$ have all strings of length $k$ as subsequences? For the last two problems, efficient algorithms were already presented in [Adamson et al., ISAAC 2023] for the case when $L$ is a regular language, and efficient solutions can be easily obtained for the first two problems. We extend that work as follows: we give sufficient conditions on the class of input-languages, under which these problems are decidable; we provide efficient algorithms for all these problems in the case when the input language is context-free; we show that all problems are undecidable for context-sensitive languages. Finally, we provide a series of initial results related to a class of languages that strictly includes the regular languages and is strictly included in the class of context-sensitive languages, but is incomparable to the of class context-free languages; these results deviate significantly from those reported for language-classes from the Chomsky hierarchy.
