A minimal base or a direct base? That is the question!
Jaume Baixeries, Amedeo Napoli
TL;DR
The paper investigates the closure of attribute sets given different dependency bases (minimal DG-Basis vs direct cdub and D-base) and how directness impacts closure algorithms. It formalizes the closure operators and directness concept, then develops direct algorithm variants (ClosureDirect, LinClosureDirect, WildClosureDirect) to optimize processing with direct bases, while outlining how to handle D-bases via the phi_0 closure. Through extensive experiments on real and synthetic data, it shows that the DG-Basis can be competitive, particularly when it is substantially smaller than the direct bases, with the best choice depending on base size and dataset characteristics. The work provides practical guidance for selecting bases and adapting closure algorithms in FCA and related areas, and highlights open questions about dynamic bases and basis-aware metrics.
Abstract
In this paper we revisit the problem of computing the closure of a set of attributes given a basis of dependencies or implications. This problem is of main interest in logics, in the relational database model, in lattice theory, and in Formal Concept Analysis as well. A basis of dependencies may have different characteristics, among which being ``minimal'', e.g., the Duquenne-Guigues Basis, or being ``direct'', e.g., the the Canonical Basis and the D-basis. Here we propose an extensive and experimental study of the impacts of minimality and directness on the closure algorithms. The results of the experiments performed on real and synthetic datasets are analyzed in depth, and suggest a different and fresh look at computing the closure of a set of attributes w.r.t. a basis of dependencies. This paper has been submitted to the International Journal of Approximate Reasoning.
