CORGI: Efficient Pattern Matching With Quadratic Guarantees
Daniel Weitekamp
TL;DR
Pattern matching in forward-chaining rule systems can suffer exponential $O(N^K)$ time and space in the worst case when rules have underconstrained variables. The authors present CORGI (Collection-Oriented Relational Graph Iteration), a two-phase approach that builds a forward relation graph and then generates matches by walking backward through mappings, avoiding storage of full conflict sets and achieving quadratic $O(KN^2)$ time/space for a single match. CORGI can stream subsequent matches without enumerating all possibilities, and it substantially outperforms RETE-based implementations (e.g., OPS5 and SOAR) on a combinatorial Valentine matching task. This work enables real-time cognitive systems and low-latency querying for rules learned or synthesized by AI agents, by providing robust guarantees and memory-efficient streaming of matches within the CRE toolset.
Abstract
Rule-based systems must solve complex matching problems within tight time constraints to be effective in real-time applications, such as planning and reactive control for AI agents, as well as low-latency relational database querying. Pattern-matching systems can encounter issues where exponential time and space are required to find matches for rules with many underconstrained variables, or which produce combinatorial intermediate partial matches (but are otherwise well-constrained). When online AI systems automatically generate rules from example-driven induction or code synthesis, they can easily produce worst-case matching patterns that slow or halt program execution by exceeding available memory. In our own work with cognitive systems that learn from example, we've found that aggressive forms of anti-unification-based generalization can easily produce these circumstances. To make these systems practical without hand-engineering constraints or succumbing to unpredictable failure modes, we introduce a new matching algorithm called CORGI (Collection-Oriented Relational Graph Iteration). Unlike RETE-based approaches, CORGI offers quadratic time and space guarantees for finding single satisficing matches, and the ability to iteratively stream subsequent matches without committing entire conflict sets to memory. CORGI differs from RETE in that it does not have a traditional $β$-memory for collecting partial matches. Instead, CORGI takes a two-step approach: a graph of grounded relations is built/maintained in a forward pass, and an iterator generates matches as needed by working backward through the graph. This approach eliminates the high-latency delays and memory overflows that can result from populating full conflict sets. In a performance evaluation, we demonstrate that CORGI significantly outperforms RETE implementations from SOAR and OPS5 on a simple combinatorial matching task.
