SHARP: Shared State Reduction for Efficient Matching of Sequential Patterns

Cong Yu; Tuo Shi; Matthias Weidlich; Bo Zhao

SHARP: Shared State Reduction for Efficient Matching of Sequential Patterns

Cong Yu, Tuo Shi, Matthias Weidlich, Bo Zhao

TL;DR

SHARP tackles the problem of efficiently matching many sequential patterns under strict latency by exploiting shared state across patterns. It introduces Pattern-Sharing Degree (PSD) to encode and index overlapping sub-patterns, a lightweight cost model to estimate per-partial-match contribution and overhead, and a hierarchical, greedy state selector to reduce state while preserving recall. Through extensive experiments on CEP, OLAP (MATCH_RECOGNIZE), and GraphRAG workloads, SHARP achieves high recall under half or more of the average processing latency, demonstrates strong robustness to pattern properties and concept drifts, and shows practical value by integrating with Neo4j-GraphRAG. The approach yields substantial performance gains over baselines and approaches near-optimal state selection with far lower runtime costs, enabling scalable, latency-aware pattern workload processing in real-world data systems.

Abstract

The detection of sequential patterns in data is a basic functionality of modern data processing systems for complex event processing (CEP), OLAP, and retrieval-augmented generation (RAG). In practice, pattern matching is challenging, since common applications rely on a large set of patterns that shall be evaluated with tight latency bounds. At the same time, matching needs to maintain state, i.e., intermediate results, that grows exponentially in the input size. Hence, systems turn to best-effort processing, striving for maximal recall under a latency bound. Existing techniques, however, consider each pattern in isolation, neglecting the optimization potential induced by state sharing in pattern matching. In this paper, we present SHARP, a library that employs state reduction to achieve efficient best-effort pattern matching. To this end, SHARP incorporates state sharing between patterns through a new abstraction, coined pattern-sharing degree (PSD). At runtime, this abstraction facilitates the categorization and indexing of partial pattern matches. Based thereon, once a latency bound is exceeded, SHARP realizes best-effort processing by selecting a subset of partial matches for further processing in constant time. In experiments with real-world data, SHARP achieves a recall of 97%, 96% and 73% for pattern matching in CEP, OLAP, and RAG applications, under a bound of 50% of the average processing latency.

SHARP: Shared State Reduction for Efficient Matching of Sequential Patterns

TL;DR

Abstract

SHARP: Shared State Reduction for Efficient Matching of Sequential Patterns

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (25)