Efficient Enumeration of Large Maximal k-Plexes
Qihao Cheng, Da Yan, Tianhao Wu, Lyuheng Yuan, Ji Cheng, Zhongyi Huang, Yang Zhou
TL;DR
The paper tackles exact enumeration of all maximal $k$-plexes of size at least $q$ in large graphs, a problem that is NP-hard for general $k$. It introduces a fast branch-and-bound framework that partitions the search space into independent tasks using seed subgraphs derived from a degeneracy ordering, and leverages a novel pivot strategy that maximizes saturated vertices to shrink candidates. Tight upper bounds and three vertex-pair pruning rules substantially prune the search, while a task-based parallelization with a timeout mechanism mitigates stragglers and preserves cache locality. The approach achieves substantial speedups over state-of-the-art methods both sequentially (up to $5\times$) and in parallel (up to $18.9\times$ with 16 threads), with ablations showing up to $7\times$ gains from pruning strategies. These results enable efficient discovery of large, cohesive subgraphs in biology and social networks, where $k$-plexes provide a robust alternative to cliques in noisy data.
Abstract
Finding cohesive subgraphs in a large graph has many important applications, such as community detection and biological network analysis. Clique is often a too strict cohesive structure since communities or biological modules rarely form as cliques for various reasons such as data noise. Therefore, $k$-plex is introduced as a popular clique relaxation, which is a graph where every vertex is adjacent to all but at most $k$ vertices. In this paper, we propose a fast branch-and-bound algorithm as well as its task-based parallel version to enumerate all maximal $k$-plexes with at least $q$ vertices. Our algorithm adopts an effective search space partitioning approach that provides a lower time complexity, a new pivot vertex selection method that reduces candidate vertex size, an effective upper-bounding technique to prune useless branches, and three novel pruning techniques by vertex pairs. Our parallel algorithm uses a timeout mechanism to eliminate straggler tasks, and maximizes cache locality while ensuring load balancing. Extensive experiments show that compared with the state-of-the-art algorithms, our sequential and parallel algorithms enumerate large maximal $k$-plexes with up to $5 \times$ and $18.9 \times$ speedup, respectively. Ablation results also demonstrate that our pruning techniques bring up to $7 \times$ speedup compared with our basic algorithm.
