Process Mining Embeddings: Learning Vector Representations for Petri Nets

Juan G. Colonna; Ahmed A. Fares; Márcio Duarte; Ricardo Sousa

Process Mining Embeddings: Learning Vector Representations for Petri Nets

Juan G. Colonna, Ahmed A. Fares, Márcio Duarte, Ricardo Sousa

TL;DR

This paper tackles the challenge of comparing complex Petri-net process models by learning vector embeddings that encode both structure and behavior. It introduces PetriNet2Vec, a doc2vec-inspired, unsupervised framework that maps Petri nets and their constituent tasks into $d$-dimensional vectors, trained on transitions contexts $(t_i,t_{i+1},m_j)$ with negative sampling. The approach yields model and task embeddings, enabling downstream tasks such as process retrieval and classification, validated on the PDC2023 dataset of 96 PNML models, with analyses showing cohesive clustering and interpretable formation rules. The work demonstrates practical capabilities for clustering and retrieval in process mining and discusses avenues for applying the method to real-world data and extending contextual scope for deeper temporal dependencies, with open-source tooling provided.

Abstract

Process Mining offers a powerful framework for uncovering, analyzing, and optimizing real-world business processes. Petri nets provide a versatile means of modeling process behavior. However, traditional methods often struggle to effectively compare complex Petri nets, hindering their potential for process enhancement. To address this challenge, we introduce PetriNet2Vec, an unsupervised methodology inspired by Doc2Vec. This approach converts Petri nets into embedding vectors, facilitating the comparison, clustering, and classification of process models. We validated our approach using the PDC Dataset, comprising 96 diverse Petri net models. The results demonstrate that PetriNet2Vec effectively captures the structural properties of process models, enabling accurate process classification and efficient process retrieval. Specifically, our findings highlight the utility of the learned embeddings in two key downstream tasks: process classification and process retrieval. In process classification, the embeddings allowed for accurate categorization of process models based on their structural properties. In process retrieval, the embeddings enabled efficient retrieval of similar process models using cosine distance. These results demonstrate the potential of PetriNet2Vec to significantly enhance process mining capabilities.

Process Mining Embeddings: Learning Vector Representations for Petri Nets

TL;DR

-dimensional vectors, trained on transitions contexts

with negative sampling. The approach yields model and task embeddings, enabling downstream tasks such as process retrieval and classification, validated on the PDC2023 dataset of 96 PNML models, with analyses showing cohesive clustering and interpretable formation rules. The work demonstrates practical capabilities for clustering and retrieval in process mining and discusses avenues for applying the method to real-world data and extending contextual scope for deeper temporal dependencies, with open-source tooling provided.

Abstract

Paper Structure (19 sections, 1 equation, 12 figures, 2 tables)

This paper contains 19 sections, 1 equation, 12 figures, 2 tables.

Introduction
Problem statement
Contributions and overview
Related works
Behavioral analysis
Structural analysis
Task comparison
Conclusions on Related Work
Background
Learning embeddings with doc2vec and graph2vec
Cluster algorithm
Process models dataset
Methodology for Learning Petri Net Embeddings
Results
Cluster analysis
...and 4 more sections

Figures (12)

Figure 1: Subfigure \ref{['fig:word2vec']} depicts the CBOW approach for word2vec. Subfigure \ref{['fig:doc2vec']} demonstrates the incorporation of the document ID from which the words were sampled. Subfigure \ref{['fig:graph2vec']} illustrates a graph where the yellow nodes represent the 'context' of node $w_i$, while adhering to the same nomenclature used in doc2vec for the words within the context of $w_i$.
Figure 2: Illustration of the rules applied to a process model. Red dots indicate the effect caused by each rule. (A) shows a bypass connection, (B) shows a loop, (C) shows an OR-construct, (D) shows an invisible task, (E) shows an optional task, and (F) shows a duplicated task and demonstrates an AND split.
Figure 3: Proposed methodology. On the left, a Petri net representation of a process model; in the middle, an equivalent representation as a directed graph of transitions; on the right, a Distributed Memory algorithm used for jointly learning embeddings for the process and tasks.
Figure 4: Caption for this figure with two images
Figure 5: Visual assessment. Left: Silhouette plot of model clusters with average silhouette indicated. Right: UMAP projection of process models with cluster colors.
...and 7 more figures

Process Mining Embeddings: Learning Vector Representations for Petri Nets

TL;DR

Abstract

Process Mining Embeddings: Learning Vector Representations for Petri Nets

Authors

TL;DR

Abstract

Table of Contents

Figures (12)