Table of Contents
Fetching ...

CORGI: Efficient Pattern Matching With Quadratic Guarantees

Daniel Weitekamp

TL;DR

Pattern matching in forward-chaining rule systems can suffer exponential $O(N^K)$ time and space in the worst case when rules have underconstrained variables. The authors present CORGI (Collection-Oriented Relational Graph Iteration), a two-phase approach that builds a forward relation graph and then generates matches by walking backward through mappings, avoiding storage of full conflict sets and achieving quadratic $O(KN^2)$ time/space for a single match. CORGI can stream subsequent matches without enumerating all possibilities, and it substantially outperforms RETE-based implementations (e.g., OPS5 and SOAR) on a combinatorial Valentine matching task. This work enables real-time cognitive systems and low-latency querying for rules learned or synthesized by AI agents, by providing robust guarantees and memory-efficient streaming of matches within the CRE toolset.

Abstract

Rule-based systems must solve complex matching problems within tight time constraints to be effective in real-time applications, such as planning and reactive control for AI agents, as well as low-latency relational database querying. Pattern-matching systems can encounter issues where exponential time and space are required to find matches for rules with many underconstrained variables, or which produce combinatorial intermediate partial matches (but are otherwise well-constrained). When online AI systems automatically generate rules from example-driven induction or code synthesis, they can easily produce worst-case matching patterns that slow or halt program execution by exceeding available memory. In our own work with cognitive systems that learn from example, we've found that aggressive forms of anti-unification-based generalization can easily produce these circumstances. To make these systems practical without hand-engineering constraints or succumbing to unpredictable failure modes, we introduce a new matching algorithm called CORGI (Collection-Oriented Relational Graph Iteration). Unlike RETE-based approaches, CORGI offers quadratic time and space guarantees for finding single satisficing matches, and the ability to iteratively stream subsequent matches without committing entire conflict sets to memory. CORGI differs from RETE in that it does not have a traditional $β$-memory for collecting partial matches. Instead, CORGI takes a two-step approach: a graph of grounded relations is built/maintained in a forward pass, and an iterator generates matches as needed by working backward through the graph. This approach eliminates the high-latency delays and memory overflows that can result from populating full conflict sets. In a performance evaluation, we demonstrate that CORGI significantly outperforms RETE implementations from SOAR and OPS5 on a simple combinatorial matching task.

CORGI: Efficient Pattern Matching With Quadratic Guarantees

TL;DR

Pattern matching in forward-chaining rule systems can suffer exponential time and space in the worst case when rules have underconstrained variables. The authors present CORGI (Collection-Oriented Relational Graph Iteration), a two-phase approach that builds a forward relation graph and then generates matches by walking backward through mappings, avoiding storage of full conflict sets and achieving quadratic time/space for a single match. CORGI can stream subsequent matches without enumerating all possibilities, and it substantially outperforms RETE-based implementations (e.g., OPS5 and SOAR) on a combinatorial Valentine matching task. This work enables real-time cognitive systems and low-latency querying for rules learned or synthesized by AI agents, by providing robust guarantees and memory-efficient streaming of matches within the CRE toolset.

Abstract

Rule-based systems must solve complex matching problems within tight time constraints to be effective in real-time applications, such as planning and reactive control for AI agents, as well as low-latency relational database querying. Pattern-matching systems can encounter issues where exponential time and space are required to find matches for rules with many underconstrained variables, or which produce combinatorial intermediate partial matches (but are otherwise well-constrained). When online AI systems automatically generate rules from example-driven induction or code synthesis, they can easily produce worst-case matching patterns that slow or halt program execution by exceeding available memory. In our own work with cognitive systems that learn from example, we've found that aggressive forms of anti-unification-based generalization can easily produce these circumstances. To make these systems practical without hand-engineering constraints or succumbing to unpredictable failure modes, we introduce a new matching algorithm called CORGI (Collection-Oriented Relational Graph Iteration). Unlike RETE-based approaches, CORGI offers quadratic time and space guarantees for finding single satisficing matches, and the ability to iteratively stream subsequent matches without committing entire conflict sets to memory. CORGI differs from RETE in that it does not have a traditional -memory for collecting partial matches. Instead, CORGI takes a two-step approach: a graph of grounded relations is built/maintained in a forward pass, and an iterator generates matches as needed by working backward through the graph. This approach eliminates the high-latency delays and memory overflows that can result from populating full conflict sets. In a performance evaluation, we demonstrate that CORGI significantly outperforms RETE implementations from SOAR and OPS5 on a simple combinatorial matching task.

Paper Structure

This paper contains 8 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Graphs for RETE (left) and Collection-oriented RETE (right) applied to the data in Table \ref{['tab:wm']} on page 5 on the Valentine's task described on the same page. $\beta$-memories are shown in dashed boxes below each $\beta$-node in the lower third of the figure. Each integer is a WME identifier (see Table \ref{['tab:wm']}). RETE produces all combinations of matches, like (15,5,11,1), whereas in Collection-oriented RETE, combinations can be produced on demand by selecting WMEs from collections like {11} and {1,2,4,6,7}.
  • Figure 2: CORGI relation graph for Valentines problem using data from Table \ref{['tab:wm']}. Black dashed boxes show collections of WME identifiers in edges between nodes that are still candidates for bindings to a particular variable (e.g., (E)). Red dotted boxes show variable binding mappings (e.g. (11) $\leftarrow$ 5) for particular variables (e.g. (P) $\leftarrow$ E)).
  • Figure 3: Log-scale runtimes for first match cycle of Valentine task (log scale) for SOAR and OPS5 implementations of RETE, and CORGI. (left) Varies the number Valentines in 1-5. (right) V=2 Valentines varying the number of objects in working memory.