Table of Contents
Fetching ...

A Process Mining-Based System For The Analysis and Prediction of Software Development Workflows

Antía Dorado, Iván Folgueira, Sofía Martín, Gonzalo Martín, Álvaro Porto, Alejandro Ramos, John Wallace

TL;DR

CodeSight addresses the lack of fine-grained visibility in software development pipelines by unifying GitHub-derived event data with process mining and predictive analytics. It builds structured event logs, applies process mining to reveal workflow variants and bottlenecks, and uses an LSTM model to predict remaining PR time and assess deadline compliance. The approach delivers high predictive performance (e.g., test accuracy ~0.944 and F1 ~0.963 for deadline compliance) and provides actionable dashboards to support proactive project management. This end-to-end pipeline demonstrates how process-oriented representations paired with deep learning can transform DevOps data into timely, decision-support insights with practical workflow improvements.

Abstract

CodeSight is an end-to-end system designed to anticipate deadline compliance in software development workflows. It captures development and deployment data directly from GitHub, transforming it into process mining logs for detailed analysis. From these logs, the system generates metrics and dashboards that provide actionable insights into PR activity patterns and workflow efficiency. Building on this structured representation, CodeSight employs an LSTM model that predicts remaining PR resolution times based on sequential activity traces and static features, enabling early identification of potential deadline breaches. In tests, the system demonstrates high precision and F1 scores in predicting deadline compliance, illustrating the value of integrating process mining with machine learning for proactive software project management.

A Process Mining-Based System For The Analysis and Prediction of Software Development Workflows

TL;DR

CodeSight addresses the lack of fine-grained visibility in software development pipelines by unifying GitHub-derived event data with process mining and predictive analytics. It builds structured event logs, applies process mining to reveal workflow variants and bottlenecks, and uses an LSTM model to predict remaining PR time and assess deadline compliance. The approach delivers high predictive performance (e.g., test accuracy ~0.944 and F1 ~0.963 for deadline compliance) and provides actionable dashboards to support proactive project management. This end-to-end pipeline demonstrates how process-oriented representations paired with deep learning can transform DevOps data into timely, decision-support insights with practical workflow improvements.

Abstract

CodeSight is an end-to-end system designed to anticipate deadline compliance in software development workflows. It captures development and deployment data directly from GitHub, transforming it into process mining logs for detailed analysis. From these logs, the system generates metrics and dashboards that provide actionable insights into PR activity patterns and workflow efficiency. Building on this structured representation, CodeSight employs an LSTM model that predicts remaining PR resolution times based on sequential activity traces and static features, enabling early identification of potential deadline breaches. In tests, the system demonstrates high precision and F1 scores in predicting deadline compliance, illustrating the value of integrating process mining with machine learning for proactive software project management.

Paper Structure

This paper contains 31 sections, 1 equation, 5 figures, 4 tables.

Figures (5)

  • Figure 1: CodeSight architecture and data processing workflow.
  • Figure 2: Conversion of raw data
  • Figure 3: Simplified process model.
  • Figure 6: Architecture of the LSTM model
  • Figure 7: Test confusion matrix