Table of Contents
Fetching ...

GreCon3: Mitigating High Resource Utilization of GreCon Algorithms for Boolean Matrix Factorization

Petr Krajča, Martin Trnecka

Abstract

Boolean matrix factorization (BMF) is a fundamental tool for analyzing binary data and discovering latent information hidden in the data. Formal Concept Analysis (FCA) provides us with an essential insight into BMF and the design of algorithms. Due to FCA, we have the GreCon and GreCon2 algorithms providing high-quality factorizations at the cost of high memory consumption and long running times. In this paper, we introduce GreCon3, a substantial revision of these algorithms, significantly improving both computational efficiency and memory usage. These improvements are achieved with a novel space-efficient data structure that tracks unprocessed data. Further, a novel strategy incrementally initializing this data structure is proposed. This strategy reduces memory consumption and omits data irrelevant to the remainder of the computation. Moreover, we show that the first factors can be discovered with less effort. Since the first factors tend to describe large portions of the data, this optimization, along with others, significantly contributes to the overall improvement of the algorithm's performance. An experimental evaluation shows that GreCon3 substantially outperforms its predecessor GreCon2. The proposed algorithm thus advances the state of the art in BMF based on FCA and enables efficient factorization of datasets previously infeasible for the GreCon algorithm.

GreCon3: Mitigating High Resource Utilization of GreCon Algorithms for Boolean Matrix Factorization

Abstract

Boolean matrix factorization (BMF) is a fundamental tool for analyzing binary data and discovering latent information hidden in the data. Formal Concept Analysis (FCA) provides us with an essential insight into BMF and the design of algorithms. Due to FCA, we have the GreCon and GreCon2 algorithms providing high-quality factorizations at the cost of high memory consumption and long running times. In this paper, we introduce GreCon3, a substantial revision of these algorithms, significantly improving both computational efficiency and memory usage. These improvements are achieved with a novel space-efficient data structure that tracks unprocessed data. Further, a novel strategy incrementally initializing this data structure is proposed. This strategy reduces memory consumption and omits data irrelevant to the remainder of the computation. Moreover, we show that the first factors can be discovered with less effort. Since the first factors tend to describe large portions of the data, this optimization, along with others, significantly contributes to the overall improvement of the algorithm's performance. An experimental evaluation shows that GreCon3 substantially outperforms its predecessor GreCon2. The proposed algorithm thus advances the state of the art in BMF based on FCA and enables efficient factorization of datasets previously infeasible for the GreCon algorithm.
Paper Structure (21 sections, 10 equations, 5 figures, 3 tables, 7 algorithms)

This paper contains 21 sections, 10 equations, 5 figures, 3 tables, 7 algorithms.

Figures (5)

  • Figure 1: Example of a binary matrix.
  • Figure 2: GreCon2's representation of the binary matrix from Fig. \ref{['fig:ctx00']}.
  • Figure 3: Example of a sparse representation of the $cells$ array for input matrix from Fig. \ref{['fig:ctx00']}.
  • Figure 4: Example of the binary matrix (left) and formal concepts in this matrix (right).
  • Figure 5: Binary matrix from Fig. \ref{['fig:ctx01']} after discovery of the factor concept $c_1$ (i.e., $\mathcal{F} = \{c1\}$) and highlighted concepts $c_2$ (left) and $c_3$ (right).

Theorems & Definitions (7)

  • Example
  • Remark
  • Remark
  • Remark
  • Remark
  • Remark
  • Remark