Table of Contents
Fetching ...

Discovering Dichotomies for Problems in Database Theory

Neha Makhija

TL;DR

The paper tackles dichotomy classification for resilience, causal responsibility, and minimal factorization in databases, particularly under bag semantics and self-joins. It introduces a unified ILP-based framework that encodes all three problems with polynomial-sized instances, proving that LP relaxations yield $PTIME$ solutions for all known tractable cases, while a MILP variant handles causal responsibility when needed. The authors establish new bag-semantics dichotomies, propose a unified hardness criterion based on Independent Join Paths (IJPs), and present a DLP-based method to certify hardness; they also identify instance-based and 2-MQP tractable classes and show read-once provenance guarantees tractability. This approach simplifies dichotomy proofs in reverse data management and holds promise for extending to other related problems and self-join scenarios.

Abstract

Dichotomy theorems, which characterize the conditions under which a problem can be solved efficiently, have helped identify important tractability borders for as probabilistic query evaluation, view maintenance, query containment (among many more problems). However, dichotomy theorems for many such problems remain elusive under key settings such as bag semantics or for queries with self-joins. This work aims to unearth dichotomies for fundamental problems in reverse data management and knowledge representation. We use a novel approach to discovering dichotomies: instead of creating dedicated algorithms for easy (PTIME) and hard cases (NP-complete), we devise unified algorithms that are guaranteed to terminate in PTIME for easy cases. Using this approach, we discovered new tractable cases for the problem of minimal factorization of provenance formulas as well as dichotomies under bag semantics for the problems of resilience and causal responsibility

Discovering Dichotomies for Problems in Database Theory

TL;DR

The paper tackles dichotomy classification for resilience, causal responsibility, and minimal factorization in databases, particularly under bag semantics and self-joins. It introduces a unified ILP-based framework that encodes all three problems with polynomial-sized instances, proving that LP relaxations yield solutions for all known tractable cases, while a MILP variant handles causal responsibility when needed. The authors establish new bag-semantics dichotomies, propose a unified hardness criterion based on Independent Join Paths (IJPs), and present a DLP-based method to certify hardness; they also identify instance-based and 2-MQP tractable classes and show read-once provenance guarantees tractability. This approach simplifies dichotomy proofs in reverse data management and holds promise for extending to other related problems and self-join scenarios.

Abstract

Dichotomy theorems, which characterize the conditions under which a problem can be solved efficiently, have helped identify important tractability borders for as probabilistic query evaluation, view maintenance, query containment (among many more problems). However, dichotomy theorems for many such problems remain elusive under key settings such as bag semantics or for queries with self-joins. This work aims to unearth dichotomies for fundamental problems in reverse data management and knowledge representation. We use a novel approach to discovering dichotomies: instead of creating dedicated algorithms for easy (PTIME) and hard cases (NP-complete), we devise unified algorithms that are guaranteed to terminate in PTIME for easy cases. Using this approach, we discovered new tractable cases for the problem of minimal factorization of provenance formulas as well as dichotomies under bag semantics for the problems of resilience and causal responsibility
Paper Structure (4 sections, 1 figure)

This paper contains 4 sections, 1 figure.

Figures (1)

  • Figure 1: Overview of complexity results for all self-join free conjunctive queries. The space of all queries is broken down into classes with different complexities as defined in the full papers makhija2021minfacmakhija2022unified. Results with a yellow background are new. $\mathtt{RES}$ denotes resilience, $\mathtt{RSP}$ causal responsibility, $\mathtt{FACT}$ minimal factorization, while $\mathtt{PROB}$ denotes probabilistic query evaluation.