Table of Contents
Fetching ...

SHARP: Shared State Reduction for Efficient Matching of Sequential Patterns

Cong Yu, Tuo Shi, Matthias Weidlich, Bo Zhao

TL;DR

SHARP tackles the problem of efficiently matching many sequential patterns under strict latency by exploiting shared state across patterns. It introduces Pattern-Sharing Degree (PSD) to encode and index overlapping sub-patterns, a lightweight cost model to estimate per-partial-match contribution and overhead, and a hierarchical, greedy state selector to reduce state while preserving recall. Through extensive experiments on CEP, OLAP (MATCH_RECOGNIZE), and GraphRAG workloads, SHARP achieves high recall under half or more of the average processing latency, demonstrates strong robustness to pattern properties and concept drifts, and shows practical value by integrating with Neo4j-GraphRAG. The approach yields substantial performance gains over baselines and approaches near-optimal state selection with far lower runtime costs, enabling scalable, latency-aware pattern workload processing in real-world data systems.

Abstract

The detection of sequential patterns in data is a basic functionality of modern data processing systems for complex event processing (CEP), OLAP, and retrieval-augmented generation (RAG). In practice, pattern matching is challenging, since common applications rely on a large set of patterns that shall be evaluated with tight latency bounds. At the same time, matching needs to maintain state, i.e., intermediate results, that grows exponentially in the input size. Hence, systems turn to best-effort processing, striving for maximal recall under a latency bound. Existing techniques, however, consider each pattern in isolation, neglecting the optimization potential induced by state sharing in pattern matching. In this paper, we present SHARP, a library that employs state reduction to achieve efficient best-effort pattern matching. To this end, SHARP incorporates state sharing between patterns through a new abstraction, coined pattern-sharing degree (PSD). At runtime, this abstraction facilitates the categorization and indexing of partial pattern matches. Based thereon, once a latency bound is exceeded, SHARP realizes best-effort processing by selecting a subset of partial matches for further processing in constant time. In experiments with real-world data, SHARP achieves a recall of 97%, 96% and 73% for pattern matching in CEP, OLAP, and RAG applications, under a bound of 50% of the average processing latency.

SHARP: Shared State Reduction for Efficient Matching of Sequential Patterns

TL;DR

SHARP tackles the problem of efficiently matching many sequential patterns under strict latency by exploiting shared state across patterns. It introduces Pattern-Sharing Degree (PSD) to encode and index overlapping sub-patterns, a lightweight cost model to estimate per-partial-match contribution and overhead, and a hierarchical, greedy state selector to reduce state while preserving recall. Through extensive experiments on CEP, OLAP (MATCH_RECOGNIZE), and GraphRAG workloads, SHARP achieves high recall under half or more of the average processing latency, demonstrates strong robustness to pattern properties and concept drifts, and shows practical value by integrating with Neo4j-GraphRAG. The approach yields substantial performance gains over baselines and approaches near-optimal state selection with far lower runtime costs, enabling scalable, latency-aware pattern workload processing in real-world data systems.

Abstract

The detection of sequential patterns in data is a basic functionality of modern data processing systems for complex event processing (CEP), OLAP, and retrieval-augmented generation (RAG). In practice, pattern matching is challenging, since common applications rely on a large set of patterns that shall be evaluated with tight latency bounds. At the same time, matching needs to maintain state, i.e., intermediate results, that grows exponentially in the input size. Hence, systems turn to best-effort processing, striving for maximal recall under a latency bound. Existing techniques, however, consider each pattern in isolation, neglecting the optimization potential induced by state sharing in pattern matching. In this paper, we present SHARP, a library that employs state reduction to achieve efficient best-effort pattern matching. To this end, SHARP incorporates state sharing between patterns through a new abstraction, coined pattern-sharing degree (PSD). At runtime, this abstraction facilitates the categorization and indexing of partial pattern matches. Based thereon, once a latency bound is exceeded, SHARP realizes best-effort processing by selecting a subset of partial matches for further processing in constant time. In experiments with real-world data, SHARP achieves a recall of 97%, 96% and 73% for pattern matching in CEP, OLAP, and RAG applications, under a bound of 50% of the average processing latency.

Paper Structure

This paper contains 34 sections, 3 equations, 25 figures, 1 table, 5 algorithms.

Figures (25)

  • Figure 1: An example of how shared patterns enhance the end-to-end responses of GraphRAG and the underlying performance challenge
  • Figure 2: The execution plan DAG$^b$ of (a) separate single patterns and (b) multiple shared patterns
  • Figure 3: System architecture of Sharp (the exmaple execution plan is identical to Fig.\ref{['fig:pattern-graph']})
  • Figure 4: The overall performance of shared CEP patterns P$_{3}$-P$_4$ over DS1 at different latency bounds
  • Figure 5: The overall performance of shared CEP patterns P$_{3}$-P$_4$ over Citi_Bikebike at different latency bounds
  • ...and 20 more figures