Table of Contents
Fetching ...

Differentially Private Inductive Miner

Max Schulze, Yorck Zisgen, Moritz Kirschte, Esfandiar Mohammadi, Agnes Koschmider

TL;DR

The paper tackles the privacy risks in process mining traces by introducing Differentially Private Inductive Miner (DPIM), a DP-based approximation of the Inductive Miner that learns a process structure tree (PST) from event logs by privatizing the directly-follows relation (DP-DFR). It proves $\varepsilon$-differential privacy for the PST construction and demonstrates through evaluation on 14 real-world logs that DPIM preserves high utility (e.g., fitness around $0.95$, precision around $0.9$, simplicity around $0.7$, generalization around $0.8$) while significantly mitigating privacy leakage. The method relies on rejection sampling and DP mechanisms (e.g., Laplace noise) applied to trace-derived features, ensuring that no single trace disproportionately influences the resulting model. This enables privacy-preserving process discovery and the potential generation of synthetic logs, reducing re-identification risk in shared process models and supporting safer data analysis and dissemination.

Abstract

Protecting personal data about individuals, such as event traces in process mining, is an inherently difficult task since an event trace leaks information about the path in a process model that an individual has triggered. Yet, prior anonymization methods of event traces like k-anonymity or event log sanitization struggled to protect against such leakage, in particular against adversaries with sufficient background knowledge. In this work, we provide a method that tackles the challenge of summarizing sensitive event traces by learning the underlying process tree in a privacy-preserving manner. We prove via the so-called Differential Privacy (DP) property that from the resulting summaries no useful inference can be drawn about any personal data in an event trace. On the technical side, we introduce a differentially private approximation (DPIM) of the Inductive Miner. Experimentally, we compare our DPIM with the Inductive Miner on 14 real-world event traces by evaluating well-known metrics: fitness, precision, simplicity, and generalization. The experiments show that our DPIM not only protects personal data but also generates faithful process trees that exhibit little utility loss above the Inductive Miner.

Differentially Private Inductive Miner

TL;DR

The paper tackles the privacy risks in process mining traces by introducing Differentially Private Inductive Miner (DPIM), a DP-based approximation of the Inductive Miner that learns a process structure tree (PST) from event logs by privatizing the directly-follows relation (DP-DFR). It proves -differential privacy for the PST construction and demonstrates through evaluation on 14 real-world logs that DPIM preserves high utility (e.g., fitness around , precision around , simplicity around , generalization around ) while significantly mitigating privacy leakage. The method relies on rejection sampling and DP mechanisms (e.g., Laplace noise) applied to trace-derived features, ensuring that no single trace disproportionately influences the resulting model. This enables privacy-preserving process discovery and the potential generation of synthetic logs, reducing re-identification risk in shared process models and supporting safer data analysis and dissemination.

Abstract

Protecting personal data about individuals, such as event traces in process mining, is an inherently difficult task since an event trace leaks information about the path in a process model that an individual has triggered. Yet, prior anonymization methods of event traces like k-anonymity or event log sanitization struggled to protect against such leakage, in particular against adversaries with sufficient background knowledge. In this work, we provide a method that tackles the challenge of summarizing sensitive event traces by learning the underlying process tree in a privacy-preserving manner. We prove via the so-called Differential Privacy (DP) property that from the resulting summaries no useful inference can be drawn about any personal data in an event trace. On the technical side, we introduce a differentially private approximation (DPIM) of the Inductive Miner. Experimentally, we compare our DPIM with the Inductive Miner on 14 real-world event traces by evaluating well-known metrics: fitness, precision, simplicity, and generalization. The experiments show that our DPIM not only protects personal data but also generates faithful process trees that exhibit little utility loss above the Inductive Miner.
Paper Structure (10 sections, 2 theorems, 6 figures, 2 tables, 5 algorithms)

This paper contains 10 sections, 2 theorems, 6 figures, 2 tables, 5 algorithms.

Key Result

Theorem 1

For a $\Delta_q$-bounded ($\Delta_q \in \mathbb{R}_+$) counting query $q$ and an $\varepsilon > 0$, the Laplace Mechanism $\mathcal{M}_{L,q,\varepsilon}$ is $\varepsilon$-DP.

Figures (6)

  • Figure 1: PST on Trace Variants 1 to 3
  • Figure 2: PST on Variants 1 to 4
  • Figure 3: Exemplary Trace Log
  • Figure 4: IM vs. the non-DP DPIM: Deviation of metrics
  • Figure 5: Fitness, precision, simplicity, and generalization (higher is better) of DPIM for 8 benchmark event logs and 3 privacy parameters (lower $\varepsilon$ means stronger privacy). The dots at the Y-axis indicate the respective performance of the IM.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1: Bounded Differential Privacy
  • Theorem 1: The Laplace Mechanism is DP
  • Theorem 2
  • proof