Table of Contents
Fetching ...

BBK: a simpler, faster algorithm for enumerating maximal bicliques in large sparse bipartite graphs

Alexis Baudin, Clémence Magnien, Lionel Tabourier

TL;DR

BBK addresses the challenge of exhaustively enumerating maximal bicliques in large sparse bipartite graphs by adapting Bron-Kerbosch to a bipartite setting through a clique-extended view and a novel projection-extended neighborhood. The method introduces a bidegeneracy order on U, efficient initialization, early maximality tests, and pivot-based pruning that operates on the original bipartite neighborhoods, achieving substantial practical speedups. The paper provides rigorous input- and output-based complexity analyses and demonstrates roughly an order-of-magnitude speed improvements over prior state-of-the-art on massive real-world datasets, with open-source C++ implementation. This work advances scalable biclique mining with direct implications for dense-subgraph discovery and frequent-itemset analysis in large bipartite networks.

Abstract

Bipartite graphs are a prevalent modeling tool for real-world networks, capturing interactions between vertices of two different types. Within this framework, bicliques emerge as crucial structures when studying dense subgraphs: they are sets of vertices such that all vertices of the first type interact with all vertices of the second type. Therefore, they allow identifying groups of closely related vertices of the network, such as individuals with similar interests or webpages with similar contents. This article introduces a new algorithm designed for the exhaustive enumeration of maximal bicliques within a bipartite graph. This algorithm, called BBK for Bipartite Bron-Kerbosch, is a new extension to the bipartite case of the Bron-Kerbosch algorithm, which enumerates the maximal cliques in standard (non-bipartite) graphs. It is faster than the state-of-the-art algorithms and allows the enumeration on massive bipartite graphs that are not manageable with existing implementations. We analyze it theoretically to establish two complexity formulas: one as a function of the input and one as a function of the output characteristics of the algorithm. We also provide an open-access implementation of BBK in C++, which we use to experiment and validate its efficiency on massive real-world datasets and show that its execution time is shorter in practice than state-of-the art algorithms. These experiments also show that the order in which the vertices are processed, as well as the choice of one of the two types of vertices on which to initiate the enumeration have an impact on the computation time.

BBK: a simpler, faster algorithm for enumerating maximal bicliques in large sparse bipartite graphs

TL;DR

BBK addresses the challenge of exhaustively enumerating maximal bicliques in large sparse bipartite graphs by adapting Bron-Kerbosch to a bipartite setting through a clique-extended view and a novel projection-extended neighborhood. The method introduces a bidegeneracy order on U, efficient initialization, early maximality tests, and pivot-based pruning that operates on the original bipartite neighborhoods, achieving substantial practical speedups. The paper provides rigorous input- and output-based complexity analyses and demonstrates roughly an order-of-magnitude speed improvements over prior state-of-the-art on massive real-world datasets, with open-source C++ implementation. This work advances scalable biclique mining with direct implications for dense-subgraph discovery and frequent-itemset analysis in large bipartite networks.

Abstract

Bipartite graphs are a prevalent modeling tool for real-world networks, capturing interactions between vertices of two different types. Within this framework, bicliques emerge as crucial structures when studying dense subgraphs: they are sets of vertices such that all vertices of the first type interact with all vertices of the second type. Therefore, they allow identifying groups of closely related vertices of the network, such as individuals with similar interests or webpages with similar contents. This article introduces a new algorithm designed for the exhaustive enumeration of maximal bicliques within a bipartite graph. This algorithm, called BBK for Bipartite Bron-Kerbosch, is a new extension to the bipartite case of the Bron-Kerbosch algorithm, which enumerates the maximal cliques in standard (non-bipartite) graphs. It is faster than the state-of-the-art algorithms and allows the enumeration on massive bipartite graphs that are not manageable with existing implementations. We analyze it theoretically to establish two complexity formulas: one as a function of the input and one as a function of the output characteristics of the algorithm. We also provide an open-access implementation of BBK in C++, which we use to experiment and validate its efficiency on massive real-world datasets and show that its execution time is shorter in practice than state-of-the art algorithms. These experiments also show that the order in which the vertices are processed, as well as the choice of one of the two types of vertices on which to initiate the enumeration have an impact on the computation time.
Paper Structure (21 sections, 4 theorems, 6 equations, 4 figures, 3 tables, 2 algorithms)

This paper contains 21 sections, 4 theorems, 6 equations, 4 figures, 3 tables, 2 algorithms.

Key Result

Theorem 3.1

Let $G = (U,V,E)$ be a bipartite graph. Then the maximal cliques of $G^{C}$ correspond to the maximal bicliques of $G$:

Figures (4)

  • Figure 1: Example of a bipartite graph, with three maximal bicliques circled in color: $\{1,A,B\}$, $\{2,3,B,C,D\}$ and $\{3,B,C,D,E\}$. Note that this graph has two other maximal bicliques, $\{1,2,3,B\}$ and $\{A,B,C,D,E\}$, not represented here for the sake of clarity.
  • Figure 2: Computation times of the bbk algorithm on the datasets of Table \ref{['tab:data']} compared to those of the oombea algorithm. On the rightmost graphs, the values for oombea are not displayed because the computation was not completed within the one week limit of the experiments.
  • Figure 3: Ratio between the execution time of Algorithm \ref{['algo:bbk']}bbk when the run is performed on the larger set $V$ to the execution time when the run is performed on the smaller set $U$. The results in blue correspond to graphs where $\overline{{b_U}} < \overline{{b_V}}$, and in orange to graphs where $\overline{{b_U}} > \overline{{b_V}}$.
  • Figure 4: Memory used by the two algorithms bbk and oombea on the datasets of Table \ref{['tab:data']}. For the four rightmost graphs, oombea cannot complete the enumeration in less than one week, so its result is not displayed.

Theorems & Definitions (13)

  • Definition 3.1: Clique-extended graph of a bipartite graph
  • Definition 3.2: Clique-extended neighborhood of a vertex
  • Theorem 3.1
  • Definition 3.3: Projection-extended neighborhood
  • Definition 3.4: Bidegeneracy order of $U$
  • Definition 3.5: Bidegeneracy of a vertex
  • Definition 3.6: Bidegeneracy of $U$ and $V$
  • Lemma 4.1
  • proof
  • Theorem 4.1
  • ...and 3 more