Packed Acyclic Deterministic Finite Automata
Hiroki Shibata, Masakazu Ishihata, Shunsuke Inenaga
TL;DR
The packed ADFA (PADFA), a compact variant of ADFA, is introduced, which is designed to achieve more efficient pattern searching by encoding specific paths as packed strings stored in contiguous memory.
Abstract
An acyclic deterministic finite automaton (ADFA) is a data structure that represents a set of strings (i.e., a dictionary) and facilitates a pattern searching problem of determining whether a given pattern string is present in the dictionary. We introduce the packed ADFA (PADFA), a compact variant of ADFA, which is designed to achieve more efficient pattern searching by encoding specific paths as packed strings stored in contiguous memory. We theoretically demonstrate that pattern searching in PADFA is near time-optimal with a small additional overhead and becomes fully time-optimal for sufficiently long patterns. Moreover, we prove that a PADFA requires fewer bits than a trie when the dictionary size is relatively smaller than the number of states in the PADFA. Lastly, we empirically show that PADFAs improve both the space and time efficiency of pattern searching on real-world datasets.
