Table of Contents
Fetching ...

A New Notion of Regularity: Finite State Automata Accepting Graphs

Yvo Ad Meeres

TL;DR

A restricted DAG language class is provided that permits the application of minimization and hyper-minimization algorithms known for DFAs, and an alternative notion of regularity coins at the existence of a DFA for recognizing a DAG language.

Abstract

Analogous to regular string and tree languages, regular languages of directed acyclic graphs (DAGs) are defined in the literature. Although called regular, those DAG-languages are more powerful and, consequently, standard problems have a higher complexity than in the string case. Top-down as well as bottom-up deterministic DAG languages are subclasses of the regular DAG languages. We refine this hierarchy by providing a weaker subclass of the deterministic DAG languages. For a DAG grammar generating a language in this new DAG language class, or, equivalently, a DAG-automaton recognizing it, a classical deterministic finite state automaton (DFA) can be constructed. As the main result, we provide a characterization of this class. The motivation behind this is the transfer of techniques for regular string languages to graphs. Trivially, our restricted DAG language class is closed under union and intersection. This permits the application of minimization and hyper-minimization algorithms known for DFAs. This alternative notion of regularity coins at the existence of a DFA for recognizing a DAG language.

A New Notion of Regularity: Finite State Automata Accepting Graphs

TL;DR

A restricted DAG language class is provided that permits the application of minimization and hyper-minimization algorithms known for DFAs, and an alternative notion of regularity coins at the existence of a DFA for recognizing a DAG language.

Abstract

Analogous to regular string and tree languages, regular languages of directed acyclic graphs (DAGs) are defined in the literature. Although called regular, those DAG-languages are more powerful and, consequently, standard problems have a higher complexity than in the string case. Top-down as well as bottom-up deterministic DAG languages are subclasses of the regular DAG languages. We refine this hierarchy by providing a weaker subclass of the deterministic DAG languages. For a DAG grammar generating a language in this new DAG language class, or, equivalently, a DAG-automaton recognizing it, a classical deterministic finite state automaton (DFA) can be constructed. As the main result, we provide a characterization of this class. The motivation behind this is the transfer of techniques for regular string languages to graphs. Trivially, our restricted DAG language class is closed under union and intersection. This permits the application of minimization and hyper-minimization algorithms known for DFAs. This alternative notion of regularity coins at the existence of a DFA for recognizing a DAG language.
Paper Structure (9 sections, 12 theorems, 3 equations, 6 figures)

This paper contains 9 sections, 12 theorems, 3 equations, 6 figures.

Key Result

Theorem 2.9

The DAG language generated by a DAG grammar $\mathcal{G} = (N,\Sigma,R)$ without useless rules is infinite iff $R$ contains a rule cycle.

Figures (6)

  • Figure 1: Classical NLP parsing is blind to coreferences within sentences, since trees cannot represent these edges within the parse tree. Graphs, on the contrary, are capable of showing coreferences between e.g. words of parsed sentences. For the above sentence, its parse trees could neither model the obvious possessive relation between the possessive pronoun his and the researcher nor the semantic kind of equivalence relation requiring world knowledge between the paper and the work. But, a semantic graph like an AMR DAG quernheim-knight:12b could. The capabilities of semantic graphs are illustrated in daggrammar as well as for a complex sentence in DBLP:conf/mol/Drewes17 by means of the representation of a sentence as an AMR DAG.
  • Figure 2: Overview over the language classes In both Venn diagrams, the circle denotes the the regular DAG languages whereas the oval denotes the language class $\mathtt{FD}$. The intersection between the two is the class $\mathtt{FID}$ which is both closed under edge swap as well as under DFA-construction. The dotted part, the oval, corresponds to $\mathtt{FD}$. The non-dotted part corresponds to $\mathtt{ID}$. \ref{['pic:regularity']} Classically, the term regularity refers to FSAs and thus to string languages. This does not match the notion of regularity for DAG languages. The two notions match only for languages in $\mathtt{FID}$. \ref{['pic:dag-classes']} Top-down determinism and bottom-up determinism are colored in yellow and blue. Consequently, green stands for languages which are both top-down as well as bottom-up deterministic. In the right Venn diagram, whereas all colored fields are deterministic, the nondeterministic part corresponds to the white part The class $\mathtt{ID}$ comprises those regular DAG languages which are not in $\mathtt{FID}$ (and consequently not in $\mathtt{FD}$), and which are either (top-down / bottom-up) deterministic or non-deterministic.
  • Figure 3: The grammar $\mathcal{G}_{star}$ gives rise to an FSA \ref{['pic:astar']} that accepts DAGs like \ref{['pic:star']} (labels are omitted).
  • Figure 4: The decorated one-pointed star $G$ is shown in \ref{['pic:decstarG']}. This equals the star $G_0$ in \ref{['pic:decstarG0']}, where in addition to the vertex label, the label's index allows us to reference each vertex uniquely by its number of copies. Note thus, that in \ref{['pic:decstarG0']}, as well as in \ref{['pic:decstarmany']}, the index is not part of the vertex label. To draw the graph itself in those two pictures, the indices would be stripped off. Note, \ref{['pic:decstarmany']} illustrates the swapping of $k+1$ disjoint isomorphic copies to a $k+1$-pointed star $G(e \bowtie e)^k$.
  • Figure 5: Rule cycles and their graphs for the binary tree language given by a cycle and its chord
  • ...and 1 more figures

Theorems & Definitions (35)

  • Definition 2.1: Graph
  • Definition 2.2: Path
  • Definition 2.3: Chord Path
  • Definition 2.4: DAG, complete DAG, prefix-DAG
  • Definition 2.5: Regular DAG grammar, $L(\mathcal{G})$, $L(\mathcal{G})^{\&}$ daggrammar
  • Definition 2.6: Derivation DAG, $\lfloor D\rfloor$
  • Definition 2.7: Rule Path and Cycle
  • Theorem 2.9: Infinite Language journals/iandc/BlumDrewes2019
  • Definition 2.10: Swap
  • Lemma 2.11: Swap Preserves Generation journals/iandc/BlumDrewes2019
  • ...and 25 more