Posets and Bounded Probabilities for Discovering Order-inducing Features in Event Knowledge Graphs
Christoffer Olling Back, Jakob Grue Simonsen
TL;DR
This work introduces a principled Bayesian framework for automated discovery of event knowledge graphs by treating EKGs as posets induced from feature-based relations and exploring their outcome spaces via linear extensions. It tackles #P-complete likelihood computations with entropy-based priors and a bespoke branch-and-bound algorithm that prunes large search spaces using monotonicity and poset bounds. The approach is validated on the BPIC 2017 dataset, showing that handbuilt EKGs can be recovered in reasonable time and highlighting the method's potential to automate structure discovery from uncurated data. The study lays groundwork for extending to derived features and higher-level predictive patterns, with future directions including stronger evaluation metrics and graph-neural representations.
Abstract
Event knowledge graphs (EKG) extend the classical notion of a trace to capture multiple, interacting views of a process execution. In this paper, we tackle the open problem of automating EKG discovery from uncurated data through a principled probabilistic framing based on the outcome space resulting from featured-derived partial orders on events. From this we derive an EKG discovery algorithm based on statistical inference rather than an ad hoc or heuristic-based strategy, or relying on manual analysis from domain experts. This approach comes at the computational cost of exploring a large, non-convex hypothesis space. In particular, solving the maximum likelihood term in our objective function involves counting the number of linear extensions of posets, which in general is #P-complete. Fortunately, bound estimates suffice for model comparison, and admit incorporation into a bespoke branch-and-bound algorithm. We establish an upper bound on our objective function which we show to be antitonic w.r.t. search depth for branching rules that are monotonic w.r.t. model inclusion. This allows pruning of large portions of the search space, which we show experimentally leads to rapid convergence toward optimal solutions that are consistent with manually built EKGs.
