Approximate Integrity Constraints in Incomplete Databases With Limited Domains
Munqath Al-atar, Attila Sali
TL;DR
This work extends the notion of strongly possible constraints to multivalued dependencies and cross joins in incomplete databases by restricting imputations to the active domain. It introduces the add-based approximation measure $g_5$ and proves $g_3(K) \ge g_5(K)$ for $sp$Keys and $sp$FDs, illustrating that additions can be more effective than deletions in achieving constraint satisfaction. The paper also defines $sp$MVDs and $sp$CJs, analyzes their complexity (noting that single-case checking can be polynomial while general $sp$CJ verification is NP-complete), and presents a comprehensive comparison of approximation measures across different constraint types. Overall, the results offer a framework for imputing data and assessing near-satisfaction of strong constraints in incomplete data, with implications for data imputation strategies and constraint-based data cleaning.
Abstract
In case of incomplete database tables, a possible world is obtained by replacing any missing value by a value from the corresponding attribute's domain that can be infinite. A possible key or possible functional dependency constraint is satisfied by an incomplete table if we can obtain a possible world that satisfies the given key or functional dependency. On the other hand, a certain key or certain functional dependency holds if all possible worlds satisfy the constraint, A strongly possible constraint is an intermediate concept between possible and certain constraints, based on the strongly possible world approach (a strongly possible world is obtained by replacing \nul's by a value from the ones appearing in the corresponding attribute of the table). A strongly possible key or functional dependency holds in an incomplete table if there exists a strongly possible world that satisfies the given constraint. In the present paper, we introduce strongly possible versions of multivalued dependencies and cross joins, and we analyse the complexity of checking the validity of a given strongly possible cross joins. We also study approximation measures of strongly possible keys (spKeys), functional dependencies (spFDs), multivalued dependencies (spMVDs) and cross joins (spCJs). We also treat complexity questions of determination of the approximation values.
