An Innovative Data-Driven and Adaptive Reinforcement Learning Approach for Context-Aware Prescriptive Process Monitoring
Mostafa Abbasi, Maziyar Khadivi, Maryam Ahang, Patricia Lasserre, Yves Lucet, Homayoun Najjaran
TL;DR
This paper tackles context-aware prescriptive process monitoring under data scarcity by introducing FORLAPS, a Fine-Tuned Offline Reinforcement Learning framework that combines offline RL with process-aware data augmentation and state-dependent reward shaping. The approach learns best-next-activity policies from real logs and demonstrates substantial efficiency gains, reducing resource time by about 31% and total process duration by about 23% across multiple domains, outperforming LSTM and PF I baselines. Key contributions include a novel state-dependent reward shaping mechanism, a realistic process-aware augmentation technique, and an offline-to-offline fine-tuning strategy validated on healthcare, finance, and regulatory processes using a Damerau-Levenshtein-based robustness metric. The work provides a practical, data-efficient pathway to deploy prescriptive process monitoring in diverse industries, while outlining limitations and directions for automated reward design and more autonomous augmentation.
Abstract
The application of artificial intelligence and machine learning in business process management has advanced significantly, however, the full potential of these technologies remains largely unexplored, primarily due to challenges related to data quality and availability. We present a novel framework called Fine-Tuned Offline Reinforcement Learning Augmented Process Sequence Optimization (FORLAPS), which aims to identify optimal execution paths in business processes by leveraging reinforcement learning enhanced with a state-dependent reward shaping mechanism, thereby enabling context-sensitive prescriptions. Additionally, to compare FORLAPS with the existing models (Permutation Feature Importance and multi-task Long Short Term Memory model), we experimented to evaluate its effectiveness in terms of resource savings and process time reduction. The experimental results on real-life event logs validate that FORLAPS achieves 31% savings in resource time spent and a 23% reduction in process time span. To further enhance learning, we introduce an innovative process-aware data augmentation technique that selectively increases the average estimated Q-values in sampled batches, enabling automatic fine-tuning of the reinforcement learning model. Robustness was assessed through both prefix-level and trace-level evaluations, using the Damerau-Levenshtein distance as the primary metric. Finally, the model's adaptability across industries was further validated through diverse case studies, including healthcare treatment pathways, financial services workflows, permit applications from regulatory bodies, and operations management. In each domain, the proposed model demonstrated exceptional performance, outperforming existing state-of-the-art approaches in prescriptive decision-making, demonstrating its capability to prescribe optimal next steps and predict the best next activities within a process trace.
