Table of Contents
Fetching ...

The Implication Problem for Functional Dependencies and Variants of Marginal Distribution Equivalences

Minna Hirvonen

TL;DR

The paper addresses the implication problem for a combined class of dependencies consisting of functional dependencies (FD) and unary probabilistic atoms—unary marginal identity (UMI) and unary marginal distribution equivalence (UMDE). It delivers a sound and complete infinite axiomatization, proves there is no finite complete axiomatization, and establishes Armstrong relations for this class. It further shows that the FD+UMI fragment can be simulated by FD+UIND, yielding a polynomial-time decision procedure, and extends this to the full FD+UMI+UMDE class with a dedicated polynomial-time algorithm. The results advance the understanding of how relational dependencies interact with probabilistic marginal properties and provide practical means for reasoning about probabilistic data constraints in uni-relational settings, with implications for database design and probabilistic data management.

Abstract

We study functional dependencies together with two different probabilistic dependency notions: unary marginal identity and unary marginal distribution equivalence. A unary marginal identity states that two variables x and y are identically distributed. A unary marginal distribution equivalence states that the multiset consisting of the marginal probabilities of all the values for variable x is the same as the corresponding multiset for y. We present a sound and complete axiomatization for the class of these dependencies and show that it has Armstrong relations. The axiomatization is infinite, but we show that there can be no finite axiomatization. The implication problem for the subclass that contains only functional dependencies and unary marginal identities can be simulated with functional dependencies and unary inclusion atoms, and therefore the problem is in polynomial-time. This complexity bound also holds in the case of the full class, which we show by constructing a polynomial-time algorithm.

The Implication Problem for Functional Dependencies and Variants of Marginal Distribution Equivalences

TL;DR

The paper addresses the implication problem for a combined class of dependencies consisting of functional dependencies (FD) and unary probabilistic atoms—unary marginal identity (UMI) and unary marginal distribution equivalence (UMDE). It delivers a sound and complete infinite axiomatization, proves there is no finite complete axiomatization, and establishes Armstrong relations for this class. It further shows that the FD+UMI fragment can be simulated by FD+UIND, yielding a polynomial-time decision procedure, and extends this to the full FD+UMI+UMDE class with a dedicated polynomial-time algorithm. The results advance the understanding of how relational dependencies interact with probabilistic marginal properties and provide practical means for reasoning about probabilistic data constraints in uni-relational settings, with implications for database design and probabilistic data management.

Abstract

We study functional dependencies together with two different probabilistic dependency notions: unary marginal identity and unary marginal distribution equivalence. A unary marginal identity states that two variables x and y are identically distributed. A unary marginal distribution equivalence states that the multiset consisting of the marginal probabilities of all the values for variable x is the same as the corresponding multiset for y. We present a sound and complete axiomatization for the class of these dependencies and show that it has Armstrong relations. The axiomatization is infinite, but we show that there can be no finite axiomatization. The implication problem for the subclass that contains only functional dependencies and unary marginal identities can be simulated with functional dependencies and unary inclusion atoms, and therefore the problem is in polynomial-time. This complexity bound also holds in the case of the full class, which we show by constructing a polynomial-time algorithm.
Paper Structure (11 sections, 9 theorems, 12 equations, 3 figures, 5 tables)

This paper contains 11 sections, 9 theorems, 12 equations, 3 figures, 5 tables.

Key Result

lemma 1

Let $\Sigma$ be a set of FDs, UMIs and UMDEs that is closed under the inference rules, i.e. $\mathsf{cl}\xspace(\Sigma)=\Sigma$. Then the graph $G(\Sigma)$ has the following properties:

Figures (3)

  • Figure 1: The graph $G(\Sigma)$ of Example \ref{['example']}. For the sake of clarity, we have removed from $G(\Sigma)$ all self-loops and some edges that are implied by transitivity.
  • Figure 2: The graph $G(\Sigma)$ of Example \ref{['example4']}.
  • Figure 3: The graph $G(\mathsf{cl}\xspace(\Sigma_k\backslash\{\delta\}))$ in the two cases (a) $\delta=\mathop{=\!}\xspace(x_0,x_1)$ and (b) $\delta=x_k\approx^*x_0$. For the sake of clarity, all the self-loops have been removed.

Theorems & Definitions (14)

  • definition 1
  • definition 2
  • definition 3: Strongly connected component
  • definition 4: Clique
  • lemma 1
  • definition 5
  • lemma 2
  • lemma 3
  • lemma 4
  • lemma 5
  • ...and 4 more