Expressive Power and Complexity Results for SIGNAL, an Industry-scale Process Query Language
Timotheus Kampik, Cem Okulmus
TL;DR
This work addresses the need for a formal understanding of industry-scale process query languages by analysing SIGNAL, focusing on its core fragment $SCC$. It demonstrates that $SCC$ is more expressive than relational algebra yet can be captured by a translation to semi-positive $\mathsf{Datalog}$, yielding a polynomial-time data-complexity upper bound (with an open question on a matching lower bound). The authors provide a formalisation of $SCC$, translate it to an extended RA, and then encode it in Datalog to establish the expressive and computational landscape. These results offer a rigorous foundation for extending industry-scale PQMs and highlight the central role of temporal reasoning in shaping their complexity and capabilities.
Abstract
With the increased adoption of process mining, there is also a need for practical solutions that work at industry scales. In this context, process querying methods (PQMs) have emerged as an important tool for drawing inferences from event logs. Here, it can be expected that industry approaches differ from academic ones, due to practical engineering and business considerations. To understand what is at the core of industry-scale PQMs, a formal analysis of the underlying languages can provide a solid foundation. To this end, we formally analyse SIGNAL, an industry-scale language for querying business process event logs developed by a large enterprise software vendor. The formal analysis shows that the core capabilities of SIGNAL, which we refer to as the SIGNAL Conjunctive Core, are more expressive than relational algebra and thus not captured by standard relational databases. We provide an upper-bound on the expressiveness via a reduction to semi-positive Datalog, which also leads to an upper bound of P-hard for the data complexity of evaluating SIGNAL Conjunctive Core queries. The findings provide first insights into how (real-world) process query languages are fundamentally different from the more generally prevalent structured query languages for querying relational databases and provide a rigorous foundation for extending the existing capabilities of the industry-scale state-of-the-art of process data querying.
