Space efficient implementation of hypergraph dualization in the D-basis algorithm
Skylar Homan, Anoop Krishnadas, Kira Adaricheva
TL;DR
This paper introduces Small Space, a memory-efficient variant of the D-basis algorithm that avoids storing the full set of implications by accumulating attribute-frequency aggregates (tsup) during hypergraph dualization via Reverse Search. The approach preserves output accuracy while dramatically reducing peak memory usage, enabling analysis on larger binary tables. It formalizes a ranking metric (rel_t) to assess attribute relevance to a target without enumerating all implications, and demonstrates substantial memory savings across real datasets with acceptable changes in computational overhead. The empirical results on STEM and Impostor Phenomenon datasets show memory reductions of up to ~98% with faster runtimes and modest increases in instruction counts, highlighting practical scalability for data analysis tasks that rely on D-basis-derived statistics rather than full implication lists.
Abstract
We present a new implementation of the $D$-basis algorithm called the Small Space which considerably reduces the algorithm's memory usage for data analysis applications. The previous implementation delivers the complete set of implications that hold on the set of attributes of an input binary table. In the new version, the only output is the frequencies of attributes that appear in the antecedents of implications from the $D$-basis, with a fixed consequent attribute. Such frequencies, rather than the implications themselves, became the primary focus in analysis of datasets where the $D$-basis has been applied over the last decade. The $D$-basis employs a hypergraph dualization algorithm, and a dualization implementation known as Reverse Search allows for the gradual computation of frequencies without the need for storing all discovered implications. We demonstrate the effectiveness of the Small Space implementation by comparing the runtimes and maximum memory usage of this new version with the current implementation.
