A Scalable and Near-Optimal Conformance Checking Approach for Long Traces
Eli Bogdanov, Izack Cohen, Avigdor Gal
TL;DR
This work tackles the challenge of conformance checking for long traces by partitioning traces into subtraces of length $L$ and solving alignments within a sliding window, reducing the search space to $W=\lceil N/L\rceil$ windows. It introduces a global-information-driven pruning mechanism via a marginal-cost lower bound and maintains model state across subtraces to preserve coherence, enabling near-optimal alignments at scale. The approach is formalized with trace/subtrace models and a cost framework, analyzed for complexity, and validated on classic and long-trace food-preparation datasets, achieving optimal alignments in over $96\%$ of traces with a small average deviation of $0.66\%$. The resulting method offers scalable, interpretable conformance checking for large-scale sensor and prediction-model-generated process logs, with practical impact on real-world process mining tasks.
Abstract
Long traces and large event logs that originate from sensors and prediction models are becoming more common in our data-rich world. In such circumstances, conformance checking, a key task in process mining, can become computationally infeasible due to the exponential complexity of finding an optimal alignment. This paper introduces a novel sliding window approach to address these scalability challenges while preserving the interpretability of alignment-based methods. By breaking down traces into manageable subtraces and iteratively aligning each with the process model, our method significantly reduces the search space. The approach uses global information that captures structural properties of the trace and the process model to make informed alignment decisions, discarding unpromising alignments even if they are optimal for a local subtrace. This improves the overall accuracy of the results. Experimental evaluations demonstrate that the proposed method consistently finds optimal alignments in most cases and highlight its scalability. This is further supported by a theoretical complexity analysis, which shows the reduced growth of the search space compared to other common conformance checking methods. This work provides a valuable contribution towards efficient conformance checking for large-scale process mining applications.
