Log Optimization Simplification Method for Predicting Remaining Time
Jianhong Ye, Siyuan Zhang, Yan Lin
TL;DR
The paper tackles remaining-time prediction from event logs by introducing a log-simplification framework guided by Resource Community Networks to choose non-deletable prediction points and prevent loss of predictive power. It couples substructure-based reduction rules within Generalised Stochastic Petri Nets with a pre-emptive log-optimization step that balances data reduction against forecast deviation using per-substructure metrics $k_i$ and $\mu_i$ under a constraint with $\Gamma$ and slack $g$. The approach is validated on a sepsis-care event log, showing that carefully selected simplifications can preserve or even improve prediction accuracy while substantially reducing data size. This has practical impact for predictive process monitoring by enabling faster, lighter-weight models without sacrificing reliability, and it opens avenues for applying the framework to other performance-prediction tasks. The core contributions are the prediction-point constraint via RCN, the substructure-based reduction rules, and the optimization-assisted log simplification, all integrated into a coherent remaining-time prediction pipeline.
Abstract
Information systems generate a large volume of event log data during business operations, much of which consists of low-value and redundant information. When performance predictions are made directly from these logs, the accuracy of the predictions can be compromised. Researchers have explored methods to simplify and compress these data while preserving their valuable components. Most existing approaches focus on reducing the dimensionality of the data by eliminating redundant and irrelevant features. However, there has been limited investigation into the efficiency of execution both before and after event log simplification. In this paper, we present a prediction point selection algorithm designed to avoid the simplification of all points that function similarly. We select sequences or self-loop structures to form a simplifiable segment, and we optimize the deviation between the actual simplifiable value and the original data prediction value to prevent over-simplification. Experiments indicate that the simplified event log retains its predictive performance and, in some cases, enhances its predictive accuracy compared to the original event log.
