EdgeMiner: Distributed Process Mining at the Data Sources

Julia Andersen; Patrick Rathje; Christian Imenkamp; Agnes Koschmider; Olaf Landsiedel

EdgeMiner: Distributed Process Mining at the Data Sources

Julia Andersen, Patrick Rathje, Christian Imenkamp, Agnes Koschmider, Olaf Landsiedel

TL;DR

This paper presents EdgeMiner, an algorithm for distributed process mining operating directly on sensor nodes on a stream of real-time event data, which determines predecessors for each event efficiently, reducing the communication overhead by up to 96% compared to querying all nodes.

Abstract

Process mining is moving beyond mining traditional event logs and nowadays includes, for example, data sourced from sensors in the Internet of Things (IoT). The volume and velocity of data generated by such sensors makes it increasingly challenging to efficiently process the data by traditional process discovery algorithms, which operate on a centralized event log. This paper presents EdgeMiner, an algorithm for distributed process mining operating directly on sensor nodes on a stream of real-time event data. In contrast to centralized algorithms, EdgeMiner tracks each event and its predecessor and successor events directly on the sensor node where the event is sensed and recorded. As EdgeMiner aggregates direct successions on the individual nodes, the raw data does not need to be stored centrally, thus improving both scalability and privacy. We analytically and experimentally show the correctness of EdgeMiner. In addition, our evaluation results show that EdgeMiner determines predecessors for each event efficiently, reducing the communication overhead by up to 96% compared to querying all nodes. Further, we show that the number of queried nodes stabilizes after relatively few events, and batching predecessor queries in groups reduces the average queried nodes per event to less than 2.5%.

EdgeMiner: Distributed Process Mining at the Data Sources

TL;DR

Abstract

Paper Structure (37 sections, 2 equations, 9 figures, 2 tables, 1 algorithm)

This paper contains 37 sections, 2 equations, 9 figures, 2 tables, 1 algorithm.

Introduction
Preliminaries and Background
Related Work
Design
Assumptions and Setting
Assumptions on Events and Event Logs
EdgeMiner Setting
EdgeMiner Algorithm
Phase 1 -- Event Ordering & Partial Footprint Matrices
Phase 2 -- Requesting a Footprint Matrix
Optimzaitons
Most-Frequent-Predecessors (MFPs)
Batching
Sliding Window
Correctness
...and 22 more sections

Figures (9)

Figure 1: While traditional process mining (left) collects all events at a central entity, EdgeMiner (right) processes them directly at the source, and only exchanges aggregates (partial footprint matrixes), increasing scalability and privacy.
Figure 2: Phase 1 -- Event Ordering and Partial Footprint Matrix Construction in EdgeMiner: Without a central entity, nodes determine the order of events collaboratively using message passing. In this example, event 1 is a start event. Therefore, after the node detects the event, it queries the other nodes for the predecessor event. In our example it does not get any positive responses, and, thus, denotes it detected a start event. Upon sensing event 2, the detecting node queries for the predecessor event, receives a response, listing event 1 as predecessor. It stores this information in its local FM.
Figure 3: Phase 2 -- Requesting a Footprint Matrix: We request partial FMs and start/end activity flags from all nodes. Upon receiving the data, we concatenate the matrices, form start and end activity sets, and compute the footprint matrix.
Figure 4: Average number of nodes queried with and without MFP Requesting, including standard error. MFP Requesting and knowledge of the start events reduce communication demands by a factor of 7.5 to 30 depending on the dataset.
Figure 5: Fitness over time. Intermediate FMs quickly converge to the centralized computed FM. The BPIC 2017 dataset, for example, already has a fitness of over 90% after 200 events.
...and 4 more figures

Theorems & Definitions (3)

definition 1: Direct Succession
definition 2: Causality, No Direct Succession
Claim 1

EdgeMiner: Distributed Process Mining at the Data Sources

TL;DR

Abstract

EdgeMiner: Distributed Process Mining at the Data Sources

Authors

TL;DR

Abstract

Table of Contents

Figures (9)

Theorems & Definitions (3)