Table of Contents
Fetching ...

Automata Size Reduction by Procedure Finding

Michal Šedý, Lukáš Holík

TL;DR

The paper presents a novel approach to reducing finite automata size by identifying and compressing repeating sub-graphs into shared procedures implemented with a single finite-domain register to store invocation context. The method constructs similarity graphs between invocations, creates procedures that replace multiple occurrences, and generates procedure transitions that preserve language while enabling nested, memory-aware execution. A gain-based heuristic selects promising similarity graphs, with post-processing steps to merge register symbols and remove vacuous guards, achieving substantial reductions in both states and transitions on benchmarks including Snort rules and ARMC/Z3-Noodler expressions. Empirical results show average reductions around 50% in states and 34% in transitions, with best cases exceeding 60–70% reductions, highlighting practical impact for FPGA-accelerated pattern matching where automata size is a critical bottleneck. The work opens avenues for extending procedure shapes, exploring other memory models (e.g., stacks), and integrating with existing automata toolchains to enable broader adoption.

Abstract

We introduce a novel paradigm for reducing the size of finite automata by compressing repeating sub-graphs. These repeating sub-graphs can be viewed as invocations of a single procedure. Instead of representing each invocation explicitly, they can be replaced by a single procedure that uses a small runtime memory to remember the call context. We elaborate on the technical details of a basic implementation of this idea, where the memory used by the procedures is a simple finite-state register. We propose methods for identifying repetitive sub-graphs, collapsing them into procedures, and measuring the resulting reduction in automata size. Already this basic implementation of reduction by procedure finding yields practically relevant results, particularly in the context of FPGA-accelerated pattern matching, where automata size is a primary bottleneck. We achieve a size reduction of up to 70\% in automata that had already been minimized using existing advanced methods.

Automata Size Reduction by Procedure Finding

TL;DR

The paper presents a novel approach to reducing finite automata size by identifying and compressing repeating sub-graphs into shared procedures implemented with a single finite-domain register to store invocation context. The method constructs similarity graphs between invocations, creates procedures that replace multiple occurrences, and generates procedure transitions that preserve language while enabling nested, memory-aware execution. A gain-based heuristic selects promising similarity graphs, with post-processing steps to merge register symbols and remove vacuous guards, achieving substantial reductions in both states and transitions on benchmarks including Snort rules and ARMC/Z3-Noodler expressions. Empirical results show average reductions around 50% in states and 34% in transitions, with best cases exceeding 60–70% reductions, highlighting practical impact for FPGA-accelerated pattern matching where automata size is a critical bottleneck. The work opens avenues for extending procedure shapes, exploring other memory models (e.g., stacks), and integrating with existing automata toolchains to enable broader adoption.

Abstract

We introduce a novel paradigm for reducing the size of finite automata by compressing repeating sub-graphs. These repeating sub-graphs can be viewed as invocations of a single procedure. Instead of representing each invocation explicitly, they can be replaced by a single procedure that uses a small runtime memory to remember the call context. We elaborate on the technical details of a basic implementation of this idea, where the memory used by the procedures is a simple finite-state register. We propose methods for identifying repetitive sub-graphs, collapsing them into procedures, and measuring the resulting reduction in automata size. Already this basic implementation of reduction by procedure finding yields practically relevant results, particularly in the context of FPGA-accelerated pattern matching, where automata size is a primary bottleneck. We achieve a size reduction of up to 70\% in automata that had already been minimized using existing advanced methods.

Paper Structure

This paper contains 29 sections, 2 theorems, 1 equation, 9 figures, 2 tables, 3 algorithms.

Key Result

lemma thmcounterlemma

Let $A$ be an SRA with nested procedures, and let $B$ is the SRA returned by Algorithm alg:mapProcedure. Then $L(A) = L(B)$ and $B$ has nested procedures.

Figures (9)

  • Figure 1: Reduction of an automaton compiled from (xac$^*$ax)+(ya (a+b) y).
  • Figure 4: Before and after the creation of procedure transitions from invocation common transitions (dashed).
  • Figure 6: Before and after the creation of procedure transitions from invocation switch switch transitions (dashed).
  • Figure 7: An example illustrating the effect of the depth limit $d$ on the resulting SRA. The SRAs are constructed from an NFA representing a word $xa_1a_2\cdots a_{100}y$.
  • Figure 8: Impact of the depth limit $d$ on the size of the resulting SRAs in Figure \ref{['fig:depth_limit']}.
  • ...and 4 more figures

Theorems & Definitions (2)

  • lemma thmcounterlemma
  • lemma thmcounterlemma