BBK: a simpler, faster algorithm for enumerating maximal bicliques in large sparse bipartite graphs
Alexis Baudin, Clémence Magnien, Lionel Tabourier
TL;DR
BBK addresses the challenge of exhaustively enumerating maximal bicliques in large sparse bipartite graphs by adapting Bron-Kerbosch to a bipartite setting through a clique-extended view and a novel projection-extended neighborhood. The method introduces a bidegeneracy order on U, efficient initialization, early maximality tests, and pivot-based pruning that operates on the original bipartite neighborhoods, achieving substantial practical speedups. The paper provides rigorous input- and output-based complexity analyses and demonstrates roughly an order-of-magnitude speed improvements over prior state-of-the-art on massive real-world datasets, with open-source C++ implementation. This work advances scalable biclique mining with direct implications for dense-subgraph discovery and frequent-itemset analysis in large bipartite networks.
Abstract
Bipartite graphs are a prevalent modeling tool for real-world networks, capturing interactions between vertices of two different types. Within this framework, bicliques emerge as crucial structures when studying dense subgraphs: they are sets of vertices such that all vertices of the first type interact with all vertices of the second type. Therefore, they allow identifying groups of closely related vertices of the network, such as individuals with similar interests or webpages with similar contents. This article introduces a new algorithm designed for the exhaustive enumeration of maximal bicliques within a bipartite graph. This algorithm, called BBK for Bipartite Bron-Kerbosch, is a new extension to the bipartite case of the Bron-Kerbosch algorithm, which enumerates the maximal cliques in standard (non-bipartite) graphs. It is faster than the state-of-the-art algorithms and allows the enumeration on massive bipartite graphs that are not manageable with existing implementations. We analyze it theoretically to establish two complexity formulas: one as a function of the input and one as a function of the output characteristics of the algorithm. We also provide an open-access implementation of BBK in C++, which we use to experiment and validate its efficiency on massive real-world datasets and show that its execution time is shorter in practice than state-of-the art algorithms. These experiments also show that the order in which the vertices are processed, as well as the choice of one of the two types of vertices on which to initiate the enumeration have an impact on the computation time.
