Automata Size Reduction by Procedure Finding
Michal Šedý, Lukáš Holík
TL;DR
The paper presents a novel approach to reducing finite automata size by identifying and compressing repeating sub-graphs into shared procedures implemented with a single finite-domain register to store invocation context. The method constructs similarity graphs between invocations, creates procedures that replace multiple occurrences, and generates procedure transitions that preserve language while enabling nested, memory-aware execution. A gain-based heuristic selects promising similarity graphs, with post-processing steps to merge register symbols and remove vacuous guards, achieving substantial reductions in both states and transitions on benchmarks including Snort rules and ARMC/Z3-Noodler expressions. Empirical results show average reductions around 50% in states and 34% in transitions, with best cases exceeding 60–70% reductions, highlighting practical impact for FPGA-accelerated pattern matching where automata size is a critical bottleneck. The work opens avenues for extending procedure shapes, exploring other memory models (e.g., stacks), and integrating with existing automata toolchains to enable broader adoption.
Abstract
We introduce a novel paradigm for reducing the size of finite automata by compressing repeating sub-graphs. These repeating sub-graphs can be viewed as invocations of a single procedure. Instead of representing each invocation explicitly, they can be replaced by a single procedure that uses a small runtime memory to remember the call context. We elaborate on the technical details of a basic implementation of this idea, where the memory used by the procedures is a simple finite-state register. We propose methods for identifying repetitive sub-graphs, collapsing them into procedures, and measuring the resulting reduction in automata size. Already this basic implementation of reduction by procedure finding yields practically relevant results, particularly in the context of FPGA-accelerated pattern matching, where automata size is a primary bottleneck. We achieve a size reduction of up to 70\% in automata that had already been minimized using existing advanced methods.
