Datalog-Expressibility for Monadic and Guarded Second-Order Logic
Manuel Bodirsky, Simon Knäuer, Sebastian Rudolph
TL;DR
This work systematically develops a game-theoretic and CSP-based framework to understand when MSO and GSO sentences over finite structures are expressible in Datalog. By introducing an existential $$(\ell,k)$$-pebble game and leveraging CSP theory for countably and \omega-categorical structures, the authors show that homomorphism-closed MSO/GSO classes decompose into finite unions of CSPs and possess canonical Datalog programs of fixed width. They further analyze the expressive boundaries among Datalog, MSO, and GSO via nested and guarded query formalisms, proving separations and limitations (e.g., that some MSO∩Datalog problems are not capturable by nested guarded queries). The results generalize known correspondences between CSPs and logical formalisms, extend them to guarded second-order contexts, and provide structural insight into the complexity and decidability landscape of query entailment and containment. Overall, the paper advances the synthesis of logic, CSPs, and Datalog, with potential impact on static analysis and database theory for infinite-domain CSPs.
Abstract
We characterise the sentences in Monadic Second-order Logic (MSO) that are over finite structures equivalent to a Datalog program, in terms of an existential pebble game. We also show that for every class C of finite structures that can be expressed in MSO and is closed under homomorphisms, and for all integers l,k, there exists a canonical Datalog program Pi of width (l,k) in the sense of Feder and Verdi. The same characterisations also hold for Guarded Second-order Logic (GSO), which properly extends MSO. To prove our results, we show that every class C in GSO whose complement is closed under homomorphisms is a finite union of constraint satisfaction problems (CSPs) of countably categorical structures. The intersection of MSO and Datalog is known to contain the class of nested monadically defined queries (Nemodeq); likewise, we show that the intersection of GSO and Datalog contains all problems that can be expressed by the more expressive language of nested guarded queries. Yet, by exploiting our results, we can show that neither of the two query languages can serve as a characterization, as we exhibit a query in the intersection of MSO and Datalog that is not expressible in nested guarded queries.
