Table of Contents
Fetching ...

Space efficient implementation of hypergraph dualization in the D-basis algorithm

Skylar Homan, Anoop Krishnadas, Kira Adaricheva

TL;DR

This paper introduces Small Space, a memory-efficient variant of the D-basis algorithm that avoids storing the full set of implications by accumulating attribute-frequency aggregates (tsup) during hypergraph dualization via Reverse Search. The approach preserves output accuracy while dramatically reducing peak memory usage, enabling analysis on larger binary tables. It formalizes a ranking metric (rel_t) to assess attribute relevance to a target without enumerating all implications, and demonstrates substantial memory savings across real datasets with acceptable changes in computational overhead. The empirical results on STEM and Impostor Phenomenon datasets show memory reductions of up to ~98% with faster runtimes and modest increases in instruction counts, highlighting practical scalability for data analysis tasks that rely on D-basis-derived statistics rather than full implication lists.

Abstract

We present a new implementation of the $D$-basis algorithm called the Small Space which considerably reduces the algorithm's memory usage for data analysis applications. The previous implementation delivers the complete set of implications that hold on the set of attributes of an input binary table. In the new version, the only output is the frequencies of attributes that appear in the antecedents of implications from the $D$-basis, with a fixed consequent attribute. Such frequencies, rather than the implications themselves, became the primary focus in analysis of datasets where the $D$-basis has been applied over the last decade. The $D$-basis employs a hypergraph dualization algorithm, and a dualization implementation known as Reverse Search allows for the gradual computation of frequencies without the need for storing all discovered implications. We demonstrate the effectiveness of the Small Space implementation by comparing the runtimes and maximum memory usage of this new version with the current implementation.

Space efficient implementation of hypergraph dualization in the D-basis algorithm

TL;DR

This paper introduces Small Space, a memory-efficient variant of the D-basis algorithm that avoids storing the full set of implications by accumulating attribute-frequency aggregates (tsup) during hypergraph dualization via Reverse Search. The approach preserves output accuracy while dramatically reducing peak memory usage, enabling analysis on larger binary tables. It formalizes a ranking metric (rel_t) to assess attribute relevance to a target without enumerating all implications, and demonstrates substantial memory savings across real datasets with acceptable changes in computational overhead. The empirical results on STEM and Impostor Phenomenon datasets show memory reductions of up to ~98% with faster runtimes and modest increases in instruction counts, highlighting practical scalability for data analysis tasks that rely on D-basis-derived statistics rather than full implication lists.

Abstract

We present a new implementation of the -basis algorithm called the Small Space which considerably reduces the algorithm's memory usage for data analysis applications. The previous implementation delivers the complete set of implications that hold on the set of attributes of an input binary table. In the new version, the only output is the frequencies of attributes that appear in the antecedents of implications from the -basis, with a fixed consequent attribute. Such frequencies, rather than the implications themselves, became the primary focus in analysis of datasets where the -basis has been applied over the last decade. The -basis employs a hypergraph dualization algorithm, and a dualization implementation known as Reverse Search allows for the gradual computation of frequencies without the need for storing all discovered implications. We demonstrate the effectiveness of the Small Space implementation by comparing the runtimes and maximum memory usage of this new version with the current implementation.

Paper Structure

This paper contains 9 sections, 1 theorem, 4 equations, 3 figures, 10 tables.

Key Result

Theorem 1

Given (reduced) table $T=(U,A,R)$, consider closure operator $\phi_A$, $x\in A$ and related hypergraph $\mathcal{H}(x)=\langle xD, \{xD\setminus M_1, \dots, xD\setminus M_k\}\rangle$. Then for any non-binary implication $Y\to x$ in the $D$-basis of operator $\phi_A$, set $Y$ is a minimal transversal

Figures (3)

  • Figure 1: The Galois lattice $L(T)$ of Table \ref{['Tab13*']}
  • Figure 2: Flow of the $D$-basis algorithm
  • Figure 3: Small Space $D$-basis algorithm

Theorems & Definitions (10)

  • Example 1
  • Example 2
  • Example 3
  • Example 4
  • Example 5
  • Example 6
  • Example 7
  • Theorem 1
  • Example 8
  • Example 9