Table of Contents
Fetching ...

Posets and Bounded Probabilities for Discovering Order-inducing Features in Event Knowledge Graphs

Christoffer Olling Back, Jakob Grue Simonsen

TL;DR

This work introduces a principled Bayesian framework for automated discovery of event knowledge graphs by treating EKGs as posets induced from feature-based relations and exploring their outcome spaces via linear extensions. It tackles #P-complete likelihood computations with entropy-based priors and a bespoke branch-and-bound algorithm that prunes large search spaces using monotonicity and poset bounds. The approach is validated on the BPIC 2017 dataset, showing that handbuilt EKGs can be recovered in reasonable time and highlighting the method's potential to automate structure discovery from uncurated data. The study lays groundwork for extending to derived features and higher-level predictive patterns, with future directions including stronger evaluation metrics and graph-neural representations.

Abstract

Event knowledge graphs (EKG) extend the classical notion of a trace to capture multiple, interacting views of a process execution. In this paper, we tackle the open problem of automating EKG discovery from uncurated data through a principled probabilistic framing based on the outcome space resulting from featured-derived partial orders on events. From this we derive an EKG discovery algorithm based on statistical inference rather than an ad hoc or heuristic-based strategy, or relying on manual analysis from domain experts. This approach comes at the computational cost of exploring a large, non-convex hypothesis space. In particular, solving the maximum likelihood term in our objective function involves counting the number of linear extensions of posets, which in general is #P-complete. Fortunately, bound estimates suffice for model comparison, and admit incorporation into a bespoke branch-and-bound algorithm. We establish an upper bound on our objective function which we show to be antitonic w.r.t. search depth for branching rules that are monotonic w.r.t. model inclusion. This allows pruning of large portions of the search space, which we show experimentally leads to rapid convergence toward optimal solutions that are consistent with manually built EKGs.

Posets and Bounded Probabilities for Discovering Order-inducing Features in Event Knowledge Graphs

TL;DR

This work introduces a principled Bayesian framework for automated discovery of event knowledge graphs by treating EKGs as posets induced from feature-based relations and exploring their outcome spaces via linear extensions. It tackles #P-complete likelihood computations with entropy-based priors and a bespoke branch-and-bound algorithm that prunes large search spaces using monotonicity and poset bounds. The approach is validated on the BPIC 2017 dataset, showing that handbuilt EKGs can be recovered in reasonable time and highlighting the method's potential to automate structure discovery from uncurated data. The study lays groundwork for extending to derived features and higher-level predictive patterns, with future directions including stronger evaluation metrics and graph-neural representations.

Abstract

Event knowledge graphs (EKG) extend the classical notion of a trace to capture multiple, interacting views of a process execution. In this paper, we tackle the open problem of automating EKG discovery from uncurated data through a principled probabilistic framing based on the outcome space resulting from featured-derived partial orders on events. From this we derive an EKG discovery algorithm based on statistical inference rather than an ad hoc or heuristic-based strategy, or relying on manual analysis from domain experts. This approach comes at the computational cost of exploring a large, non-convex hypothesis space. In particular, solving the maximum likelihood term in our objective function involves counting the number of linear extensions of posets, which in general is #P-complete. Fortunately, bound estimates suffice for model comparison, and admit incorporation into a bespoke branch-and-bound algorithm. We establish an upper bound on our objective function which we show to be antitonic w.r.t. search depth for branching rules that are monotonic w.r.t. model inclusion. This allows pruning of large portions of the search space, which we show experimentally leads to rapid convergence toward optimal solutions that are consistent with manually built EKGs.
Paper Structure (14 sections, 10 theorems, 35 equations, 7 figures, 3 tables)

This paper contains 14 sections, 10 theorems, 35 equations, 7 figures, 3 tables.

Key Result

Proposition 1

Given posets $D \coloneqq (E, \prec)$ and $D'\coloneqq (E, \prec')$ over the same set $E$, a model $\mathcal{M}(E)$, and df-path generator $g_\mathcal{M}$, then $D$ extends $g_\mathcal{M}(D')$ and $D'$ extends $g_\mathcal{M}(D)$ if and only if $g_\mathcal{M}(D) = g_\mathcal{M}(D')$.

Figures (7)

  • Figure 1: Transitive reduction of the poset induced by feature relations in Tab. \ref{['tab:event_table']}.
  • Figure 2: The df-path generator $g_\mathcal{M}$ is in general not injective: it will typically map several different partial orders over the same set and model $\mathcal{M}$ to one new partial order.
  • Figure 3: Repeated decomposition with bounds from Cor. \ref{['cor:bounds-disjoint']} and \ref{['cor:bounds-minelement']}. First, for free events $F$ we establish $|\mathcal{E}_{F}| = 5!$, then by disjoint decomposition on $F$ and $Q$ we establish our first bounds. Next, minimal element decomposition is applied to establish tighter bounds using $R$ and $S$. Finally, first minimal element then disjoint decomposition is applied resulting in posets $T,U,V,W,X$.
  • Figure 4: Convergence results after four hours across varying number of samples ($5 \leq N \leq 40$) and poset/sample size ($4 \leq |D| \leq 512$). Lines begin at first completed estimate; those that end indicate the algorithm finished before four hour cutoff.
  • Figure 5: Event knowledge graphs for the loan application number 681547497 from the BPIC 2017 event log. Fig. (\ref{['fig:their-ekg-681547497']}) is the handbuilt EKG from esser_multi-dimensional_2021, Fig. (\ref{['fig:our-ekg-apptype-681547497']}) is derived from the model most commonly discovered by our algorithm.
  • ...and 2 more figures

Theorems & Definitions (25)

  • Definition 1: Atomic feature relation
  • Example 1
  • Definition 2: Derived feature relation
  • Example 2
  • Definition 3: df-path generator
  • Example 3
  • Proposition 1
  • proof
  • Definition 4: Normalized Shannon entropy
  • Definition 5: $\eta$-based model prior
  • ...and 15 more