Differentially Private Inductive Miner

Max Schulze; Yorck Zisgen; Moritz Kirschte; Esfandiar Mohammadi; Agnes Koschmider

Differentially Private Inductive Miner

Max Schulze, Yorck Zisgen, Moritz Kirschte, Esfandiar Mohammadi, Agnes Koschmider

TL;DR

The paper tackles the privacy risks in process mining traces by introducing Differentially Private Inductive Miner (DPIM), a DP-based approximation of the Inductive Miner that learns a process structure tree (PST) from event logs by privatizing the directly-follows relation (DP-DFR). It proves $\varepsilon$-differential privacy for the PST construction and demonstrates through evaluation on 14 real-world logs that DPIM preserves high utility (e.g., fitness around $0.95$, precision around $0.9$, simplicity around $0.7$, generalization around $0.8$) while significantly mitigating privacy leakage. The method relies on rejection sampling and DP mechanisms (e.g., Laplace noise) applied to trace-derived features, ensuring that no single trace disproportionately influences the resulting model. This enables privacy-preserving process discovery and the potential generation of synthetic logs, reducing re-identification risk in shared process models and supporting safer data analysis and dissemination.

Abstract

Protecting personal data about individuals, such as event traces in process mining, is an inherently difficult task since an event trace leaks information about the path in a process model that an individual has triggered. Yet, prior anonymization methods of event traces like k-anonymity or event log sanitization struggled to protect against such leakage, in particular against adversaries with sufficient background knowledge. In this work, we provide a method that tackles the challenge of summarizing sensitive event traces by learning the underlying process tree in a privacy-preserving manner. We prove via the so-called Differential Privacy (DP) property that from the resulting summaries no useful inference can be drawn about any personal data in an event trace. On the technical side, we introduce a differentially private approximation (DPIM) of the Inductive Miner. Experimentally, we compare our DPIM with the Inductive Miner on 14 real-world event traces by evaluating well-known metrics: fitness, precision, simplicity, and generalization. The experiments show that our DPIM not only protects personal data but also generates faithful process trees that exhibit little utility loss above the Inductive Miner.

Differentially Private Inductive Miner

TL;DR

-differential privacy for the PST construction and demonstrates through evaluation on 14 real-world logs that DPIM preserves high utility (e.g., fitness around

, precision around

, simplicity around

, generalization around

) while significantly mitigating privacy leakage. The method relies on rejection sampling and DP mechanisms (e.g., Laplace noise) applied to trace-derived features, ensuring that no single trace disproportionately influences the resulting model. This enables privacy-preserving process discovery and the potential generation of synthetic logs, reducing re-identification risk in shared process models and supporting safer data analysis and dissemination.

Abstract

Paper Structure (10 sections, 2 theorems, 6 figures, 2 tables, 5 algorithms)

This paper contains 10 sections, 2 theorems, 6 figures, 2 tables, 5 algorithms.

Introduction
Problem Statement
Related Work
Preliminaries
Approach
High-level Description of DPIM
Detailed Description of the Subalgorithms
Privacy Guarantee
Evaluation
Conclusion

Key Result

Theorem 1

For a $\Delta_q$-bounded ($\Delta_q \in \mathbb{R}_+$) counting query $q$ and an $\varepsilon > 0$, the Laplace Mechanism $\mathcal{M}_{L,q,\varepsilon}$ is $\varepsilon$-DP.

Figures (6)

Figure 1: PST on Trace Variants 1 to 3
Figure 2: PST on Variants 1 to 4
Figure 3: Exemplary Trace Log
Figure 4: IM vs. the non-DP DPIM: Deviation of metrics
Figure 5: Fitness, precision, simplicity, and generalization (higher is better) of DPIM for 8 benchmark event logs and 3 privacy parameters (lower $\varepsilon$ means stronger privacy). The dots at the Y-axis indicate the respective performance of the IM.
...and 1 more figures

Theorems & Definitions (4)

Definition 1: Bounded Differential Privacy
Theorem 1: The Laplace Mechanism is DP
Theorem 2
proof

Differentially Private Inductive Miner

TL;DR

Abstract

Differentially Private Inductive Miner

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (4)