Table of Contents
Fetching ...

Targeted Least Cardinality Candidate Key for Relational Databases

Vasileios Nakos, Hung Q. Ngo, Charalampos E. Tsourakakis

TL;DR

TCAND addresses the problem of finding a minimal attribute set $X$ with $T \subseteq X^+$ that is implied by a given FD set $\mathcal{F}$, generalizing the classical least cardinality candidate key problem. The authors formulate TCAND as a layered set-cover problem and provide an exact IP, along with LP relaxations that enable deterministic and randomized rounding schemes; they show a $(f+1)^D$-approximation for $D$ rounds and establish a 1-round equivalence to Red Blue Set Cover, which yields both new algorithms and strong inapproximability results under the Dense-vs-Random conjecture. They also prove integrality gaps for the LP that scale exponentially with the number of rounds, underscoring fundamental limits of LP-based methods. The work connects TCAND to practical semantic optimization in query engines, offers practical algorithmic pathways, and delineates substantial theoretical barriers via reductions to Red Blue Set Cover and Dense-vs-Random hypotheses.

Abstract

Functional dependencies (FDs) are a central theme in databases, playing a major role in the design of database schemas and the optimization of queries. In this work, we introduce the {\it targeted least cardinality candidate key problem} (TCAND). This problem is defined over a set of functional dependencies $F$ and a target variable set $T \subseteq V$, and it aims to find the smallest set $X \subseteq V$ such that the FD $X \to T$ can be derived from $F$. The TCAND problem generalizes the well-known NP-hard problem of finding the least cardinality candidate key~\cite{lucchesi1978candidate}, which has been previously demonstrated to be at least as difficult as the set cover problem. We present an integer programming (IP) formulation for the TCAND problem, analogous to a layered set cover problem. We analyze its linear programming (LP) relaxation from two perspectives: we propose two approximation algorithms and investigate the integrality gap. Our findings indicate that the approximation upper bounds for our algorithms are not significantly improvable through LP rounding, a notable distinction from the standard set cover problem. Additionally, we discover that a generalization of the TCAND problem is equivalent to a variant of the set cover problem, named red-blue set cover~\cite{carr1999red}, which cannot be approximated within a sub-polynomial factor in polynomial time under plausible conjectures~\cite{chlamtavc2023approximating}. Despite the extensive history surrounding the issue of identifying the least cardinality candidate key, our research contributes new theoretical insights, novel algorithms, and demonstrates that the general TCAND problem poses complexities beyond those encountered in the set cover problem.

Targeted Least Cardinality Candidate Key for Relational Databases

TL;DR

TCAND addresses the problem of finding a minimal attribute set with that is implied by a given FD set , generalizing the classical least cardinality candidate key problem. The authors formulate TCAND as a layered set-cover problem and provide an exact IP, along with LP relaxations that enable deterministic and randomized rounding schemes; they show a -approximation for rounds and establish a 1-round equivalence to Red Blue Set Cover, which yields both new algorithms and strong inapproximability results under the Dense-vs-Random conjecture. They also prove integrality gaps for the LP that scale exponentially with the number of rounds, underscoring fundamental limits of LP-based methods. The work connects TCAND to practical semantic optimization in query engines, offers practical algorithmic pathways, and delineates substantial theoretical barriers via reductions to Red Blue Set Cover and Dense-vs-Random hypotheses.

Abstract

Functional dependencies (FDs) are a central theme in databases, playing a major role in the design of database schemas and the optimization of queries. In this work, we introduce the {\it targeted least cardinality candidate key problem} (TCAND). This problem is defined over a set of functional dependencies and a target variable set , and it aims to find the smallest set such that the FD can be derived from . The TCAND problem generalizes the well-known NP-hard problem of finding the least cardinality candidate key~\cite{lucchesi1978candidate}, which has been previously demonstrated to be at least as difficult as the set cover problem. We present an integer programming (IP) formulation for the TCAND problem, analogous to a layered set cover problem. We analyze its linear programming (LP) relaxation from two perspectives: we propose two approximation algorithms and investigate the integrality gap. Our findings indicate that the approximation upper bounds for our algorithms are not significantly improvable through LP rounding, a notable distinction from the standard set cover problem. Additionally, we discover that a generalization of the TCAND problem is equivalent to a variant of the set cover problem, named red-blue set cover~\cite{carr1999red}, which cannot be approximated within a sub-polynomial factor in polynomial time under plausible conjectures~\cite{chlamtavc2023approximating}. Despite the extensive history surrounding the issue of identifying the least cardinality candidate key, our research contributes new theoretical insights, novel algorithms, and demonstrates that the general TCAND problem poses complexities beyond those encountered in the set cover problem.
Paper Structure (12 sections, 11 theorems, 15 equations, 2 figures, 3 algorithms)

This paper contains 12 sections, 11 theorems, 15 equations, 2 figures, 3 algorithms.

Key Result

Theorem 1

There exists an $O(m^{1/3} \log^{4/3} n \log k)$-approximation algorithm for the Red Blue Set Cover problem where $m$ is the number of sets, $n$ is the number of red elements, and $k$ is the number of blue elements.

Figures (2)

  • Figure 1: Visual representation of IP \ref{['ipexact']}.
  • Figure 2: The Boolean variables $x_1,\ldots,x_n$ denote whether an attribute $i$ is active, i.e., meaning it is included in the closure of the selected set of variables. The target variables are all set to 1.

Theorems & Definitions (20)

  • Theorem 1: Chlamtáč et al. chlamtavc2023approximating
  • Conjecture 1: Bhaskara+10chlamtac2012everywherechlamtavc2017minimizing
  • Theorem 2: Chlamtáč et al. chlamtavc2023approximating
  • Lemma 1: Hajnal-Szemerédi HajnalSzemeredi
  • Definition 1
  • Theorem 3
  • proof
  • Corollary 1
  • Theorem 4
  • proof
  • ...and 10 more