Table of Contents
Fetching ...

FC-Datalog as a Framework for Efficient String Querying

Owen M. Bell, Joel D. Day, Dominik D. Freydenberger

TL;DR

The paper addresses efficient string querying by extending FC with recursion in FC-Datalog and connecting it to core spanners. It introduces a spectrum of fragments (linearity, determinism, one-letter lookahead, strictly decreasing) that all achieve $ ext{LOGSPACE}$ expressivity while varying data and combined complexity, with SD variants achieving linear combined time. A key contribution is showing how to simulate deterministic regex (DRX) within tailored FC-Datalog fragments, enabling concise, tractable programs for practical string processing. The framework provides a pathway to design application-specific fragments that balance expressive power and efficient model checking, independent of spanner dependencies. Overall, the work advances tractable recursion over strings and offers concrete fragments for targeted information extraction tasks.

Abstract

Core spanners are a class of document spanners that capture the core functionality of IBM's AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures P, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization. We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.

FC-Datalog as a Framework for Efficient String Querying

TL;DR

The paper addresses efficient string querying by extending FC with recursion in FC-Datalog and connecting it to core spanners. It introduces a spectrum of fragments (linearity, determinism, one-letter lookahead, strictly decreasing) that all achieve expressivity while varying data and combined complexity, with SD variants achieving linear combined time. A key contribution is showing how to simulate deterministic regex (DRX) within tailored FC-Datalog fragments, enabling concise, tractable programs for practical string processing. The framework provides a pathway to design application-specific fragments that balance expressive power and efficient model checking, independent of spanner dependencies. Overall, the work advances tractable recursion over strings and offers concrete fragments for targeted information extraction tasks.

Abstract

Core spanners are a class of document spanners that capture the core functionality of IBM's AQL. FC is a logic on strings built around word equations that when extended with constraints for regular languages can be seen as a logic for core spanners. The recently introduced FC-Datalog extends FC with recursion, which allows us to define recursive relations for core spanners. Additionally, as FC-Datalog captures P, it is also a tractable version of Datalog on strings. This presents an opportunity for optimization. We propose a series of FC-Datalog fragments with desirable properties in terms of complexity of model checking, expressive power, and efficiency of checking membership in the fragment. This leads to a range of fragments that all capture LOGSPACE, which we further restrict to obtain linear combined complexity. This gives us a framework to tailor fragments for particular applications. To showcase this, we simulate deterministic regex in a tailored fragment of FC-Datalog.
Paper Structure (10 sections, 6 theorems, 8 equations)

This paper contains 10 sections, 6 theorems, 8 equations.

Key Result

Corollary 9

Combined and expression complexity of linear $\mathsf{FC}$-$\mathsf{Datalog}$ is in $\mathsf{PSPACE}$.

Theorems & Definitions (37)

  • Example 1
  • Example 2
  • Definition 3
  • Example 4
  • Example 5
  • Definition 6
  • Example 7
  • Definition 8
  • Corollary 9
  • Lemma 10
  • ...and 27 more