Table of Contents
Fetching ...

Minimizing speculation overhead in a parallel recognizer for regular texts

Angelo Borsotti, Luca Breveglieri, Stefano Crespi Reghizzi, Angelo Morzenti

TL;DR

The paper tackles the overhead inherent in speculative data-parallel recognition of regular languages on multi-core architectures. It introduces Reduced-Interface DFA (RI-DFA), a deterministic multi-entry automaton whose initial interface matches the NFA's state set, enabling a Reduced-Interface DFA (RI-DFA) chunk automaton (RID) that minimizes speculative transitions while preserving language equivalence. The authors prove RID correctness, propose an initial-state minimization technique, and demonstrate experimentally that RI-DFA reduces speculation and yields substantial parallel speedups compared with both DFA- and NFA-based baselines, with moderate construction costs. The work offers a practical, theoretically sound approach to scalable parallel regular-language recognition and shows compatibility with other optimization strategies for finite-state machines. The findings suggest RI-DFA is particularly advantageous when the NFA is small relative to the DFA, and it provides a viable path for efficient recognition on commodity multi-core systems.

Abstract

Speculative data-parallel algorithms for language recognition have been widely experimented for various types of finite-state automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular expressions (RE). Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup compared to a serial algorithm. Existing data-parallel DFA-based recognizers suffer from the excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions. Our data-parallel algorithm is based on the new FA type called reduced interface DFA (RI-DFA), which minimizes the speculation overhead without incurring in the penalty of nondeterministic transitions or of impractically enlarged DFA machines. The algorithm is proved to be correct and theoretically efficient, because it combines the state-reduction of an NFA with the speed of deterministic transitions, thus improving on both DFA-based and NFA-based existing implementations. The practical applicability of the RI-DFA approach is confirmed by a quantitative comparison of the number of starting states for a large public benchmark of complex FAs. On multi-core computing architectures, the RI-DFA recognizer is much faster than the NFA-based one on all benchmarks, while it matches the DFA-based one on some benchmarks and performs much better on some others. The extra time cost needed to construct an RI-DFA compared to a DFA is moderate and is compatible with a practical use.

Minimizing speculation overhead in a parallel recognizer for regular texts

TL;DR

The paper tackles the overhead inherent in speculative data-parallel recognition of regular languages on multi-core architectures. It introduces Reduced-Interface DFA (RI-DFA), a deterministic multi-entry automaton whose initial interface matches the NFA's state set, enabling a Reduced-Interface DFA (RI-DFA) chunk automaton (RID) that minimizes speculative transitions while preserving language equivalence. The authors prove RID correctness, propose an initial-state minimization technique, and demonstrate experimentally that RI-DFA reduces speculation and yields substantial parallel speedups compared with both DFA- and NFA-based baselines, with moderate construction costs. The work offers a practical, theoretically sound approach to scalable parallel regular-language recognition and shows compatibility with other optimization strategies for finite-state machines. The findings suggest RI-DFA is particularly advantageous when the NFA is small relative to the DFA, and it provides a viable path for efficient recognition on commodity multi-core systems.

Abstract

Speculative data-parallel algorithms for language recognition have been widely experimented for various types of finite-state automata (FA), deterministic (DFA) and nondeterministic (NFA), often derived from regular expressions (RE). Such an algorithm cuts the input string into chunks, independently recognizes each chunk in parallel by means of identical FAs, and at last joins the chunk results and checks overall consistency. In chunk recognition, it is necessary to speculatively start the FAs in any state, thus causing an overhead that reduces the speedup compared to a serial algorithm. Existing data-parallel DFA-based recognizers suffer from the excessive number of starting states, and the NFA-based ones suffer from the number of nondeterministic transitions. Our data-parallel algorithm is based on the new FA type called reduced interface DFA (RI-DFA), which minimizes the speculation overhead without incurring in the penalty of nondeterministic transitions or of impractically enlarged DFA machines. The algorithm is proved to be correct and theoretically efficient, because it combines the state-reduction of an NFA with the speed of deterministic transitions, thus improving on both DFA-based and NFA-based existing implementations. The practical applicability of the RI-DFA approach is confirmed by a quantitative comparison of the number of starting states for a large public benchmark of complex FAs. On multi-core computing architectures, the RI-DFA recognizer is much faster than the NFA-based one on all benchmarks, while it matches the DFA-based one on some benchmarks and performs much better on some others. The extra time cost needed to construct an RI-DFA compared to a DFA is moderate and is compatible with a practical use.

Paper Structure

This paper contains 14 sections, 4 theorems, 3 equations, 8 figures, 3 tables.

Key Result

theorem 1

The RID accepts the same language as the NFA $N$.

Figures (8)

  • Figure 1: Top: NFA with the equivalent powerset DFA (minimal) and the new RI-DFA, over the alphabet $\Sigma$. The states that act as initial in the CA are in green. Bottom: transitions executed by the reach phase for the string "$aabcab$" divided in two chunks, join of the ending and starting states of adjacent chunks, and number of transitions.
  • Figure 2: CSDPA device using DFA. In the CA (top right) all states are initial and final. To recognize the two-chunk input $bab \cdot aaa$, nine transitions are done; CA $A_2$ executes two $3$-step runs scanning the entire chunk. The join phase (bottom) computes $\text{PLAS}_2$.
  • Figure 3: Left: the given NFA $N$. Right: the RI-DFA $B$ obtained by incrementally adding to $N(0)$ first $N(1)$ and then $N(2)$. The states $I^B = \{ \left\{ 0 \right\}, \left\{ 1 \right\}, \left\{ 2 \right\} \}$ coloured in green act as initial.
  • Figure 4: NFA, runs of CAs $B_1$ and $B_2$, and interface function if.
  • Figure 5: Interface minimization of an RI-DFA. Top: NFA $N$. Bottom: RI-DFA with initial states $p_0$, $p_1$, $p_2$ and $p_3$ (in green). States $p_1$ and $p_3$ are undistinguishable and state $p_3$ (arbitrarily chosen) is downgraded from initial to non-initial, thus reducing the initial state set to $\left\{ p_0, \, p_1, \, p_2 \right\}$ (dashed box). The content of state $p_1$ has to be updated to $13$ (not shown here), to adjust the interface function $\emph{if}$.
  • ...and 3 more figures

Theorems & Definitions (6)

  • theorem 1: Correctness
  • lemma 1
  • proof
  • lemma 2
  • proof
  • theorem 2: Minimality