Table of Contents
Fetching ...

Accurate and Noise-Tolerant Extraction of Routine Logs in Robotic Process Automation (Extended Version)

Massimiliano de Leoni, Faizan Ahmed Khan, Simone Agostinelli

TL;DR

This work tackles the challenge of deriving accurate routine logs from UI logs to enable automatic discovery of routine-type models for Robotic Process Automation. It introduces a noise-tolerant, clustering-based pipeline that segments UI logs at predefined completion actions, encodes each routine execution as action-count vectors, and clusters these vectors with $K$-Means, DBSCAN, or HDBSCAN to produce distinct routine logs. Evaluated on nine synthetic UI logs with varying noise and on a real-world UI log, the approach shows higher $JC$ and alignment-based $Fitness$ than state-of-the-art baselines and produces fewer empty logs, demonstrating robustness to human variability. The method advances practical routine-type model discovery in RPM by providing a scalable, flexible pipeline that improves the quality of discovered routine logs and, consequently, the accuracy of downstream process models.

Abstract

Robotic Process Mining focuses on the identification of the routine types performed by human resources through a User Interface. The ultimate goal is to discover routine-type models to enable robotic process automation. The discovery of routine-type models requires the provision of a routine log. Unfortunately, the vast majority of existing works do not directly focus on enabling the model discovery, limiting themselves to extracting the set of actions that are part of the routines. They were also not evaluated in scenarios characterized by inconsistent routine execution, hereafter referred to as noise, which reflects natural variability and occasional errors in human performance. This paper presents a clustering-based technique that aims to extract routine logs. Experiments were conducted on nine UI logs from the literature with different levels of injected noise. Our technique was compared with existing techniques, most of which are not meant to discover routine logs but were adapted for the purpose. The results were evaluated through standard state-of-the-art metrics, showing that we can extract more accurate routine logs than what the state of the art could, especially in the presence of noise.

Accurate and Noise-Tolerant Extraction of Routine Logs in Robotic Process Automation (Extended Version)

TL;DR

This work tackles the challenge of deriving accurate routine logs from UI logs to enable automatic discovery of routine-type models for Robotic Process Automation. It introduces a noise-tolerant, clustering-based pipeline that segments UI logs at predefined completion actions, encodes each routine execution as action-count vectors, and clusters these vectors with -Means, DBSCAN, or HDBSCAN to produce distinct routine logs. Evaluated on nine synthetic UI logs with varying noise and on a real-world UI log, the approach shows higher and alignment-based than state-of-the-art baselines and produces fewer empty logs, demonstrating robustness to human variability. The method advances practical routine-type model discovery in RPM by providing a scalable, flexible pipeline that improves the quality of discovered routine logs and, consequently, the accuracy of downstream process models.

Abstract

Robotic Process Mining focuses on the identification of the routine types performed by human resources through a User Interface. The ultimate goal is to discover routine-type models to enable robotic process automation. The discovery of routine-type models requires the provision of a routine log. Unfortunately, the vast majority of existing works do not directly focus on enabling the model discovery, limiting themselves to extracting the set of actions that are part of the routines. They were also not evaluated in scenarios characterized by inconsistent routine execution, hereafter referred to as noise, which reflects natural variability and occasional errors in human performance. This paper presents a clustering-based technique that aims to extract routine logs. Experiments were conducted on nine UI logs from the literature with different levels of injected noise. Our technique was compared with existing techniques, most of which are not meant to discover routine logs but were adapted for the purpose. The results were evaluated through standard state-of-the-art metrics, showing that we can extract more accurate routine logs than what the state of the art could, especially in the presence of noise.

Paper Structure

This paper contains 13 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Example illustrating the preliminary concepts: transformation from an UI log to multiple routine logs of a specific routine type.
  • Figure 2: Overview of the proposed technique for routine log discovery.
  • Figure 3: Trend of the JC and fitness values for the different techniques evaluated, under varying noise levels. The dotted lines represent state-of-the-art techniques, while the solid lines correspond to our techniques using different clustering methods.
  • Figure 4: Average Jaccard Coefficient (JC) values across all techniques. The x-axis lists the evaluated methods, while the y-axis shows the average JC score (0–1). Higher values indicate greater similarity between the extracted routines and the ground-truth routines.
  • Figure 5: Average Fitness values across all techniques. The x-axis lists the evaluated methods, while the y-axis shows the average fitness score (0–1). Higher values indicate stronger conformance of the discovered routine logs with the ground-truth models.

Theorems & Definitions (4)

  • definition thmcounterdefinition: UI Log
  • definition thmcounterdefinition: The Problem of Clustering Routine Executions
  • definition thmcounterdefinition: Jaccard Coefficient
  • definition thmcounterdefinition: Fitness