Scalable $k$-clique Densest Subgraph Search
Xiaowei Ye, Miao Qiao, Rong-Hua Li, Qi Zhang, Guoren Wang
TL;DR
This work tackles the scalable solution of the $k$-clique densest subgraph search ($k$-$\mathsf{DSS}$) by introducing a SCT-based convex programming formulation ${\mathsf{SCT\text{-}CP}}(G)$ and solving it with a Frank–Wolfe algorithm ${\mathsf{PSCTL}}$, achieving a runtime independent of the number of $k$-cliques $|\mathcal{C}_k(V)|$. To further scale to massive graphs, it introduces a polynomial-time sampling method ${\mathsf{CPSample}}$ that uses ${\mathsf{CCPATH}}$ to sample $k$-cliques uniformly and estimates density with provable accuracy guarantees. Theoretical results link ${\mathsf{SCT\text{-}CP}}(G)$ to near-optimal $k$-DSS solutions, providing bounds such as $\rho_k(H(r^*)) \ge (1 - 1/(k|V^*|)) \rho_k(V^*)$, and the sampling method comes with Chernoff-based guarantees ensuring reliable approximations under mild conditions. Experiments on 12 large real-world graphs show orders-of-magnitude speedups over state-of-the-art methods, with ${\mathsf{CPSample}}$ solving networks with up to $1.8\times 10^9$ edges and providing competitive accuracy, while ${\mathsf{PSCTL}}$ remains highly efficient and memory-friendly due to the SCT-based framework. The authors also release open-source code, enabling reproducibility and practical adoption in large-scale network analysis.
Abstract
In this paper, we present a collection of novel and scalable algorithms designed to tackle the challenges inherent in the $k$-clique densest subgraph problem (\kcdsp) within network analysis. We propose \psctl, a novel algorithm based on the Frank-Wolfe approach for addressing \kcdsp, effectively solving a distinct convex programming problem. \textcolor{black}{\psctl is able to approximate \kcdsp with near optimal guarantees.} The notable advantage of \psctl lies in its time complexity, which is independent of the count of $k$-cliques, resulting in remarkable efficiency in practical applications. Additionally, we present \spath, a sampling-based algorithm with the capability to handle networks on an unprecedented scale, reaching up to $1.8\times 10^9$ edges. By leveraging the \ccpath algorithm as a uniform $k$-clique sampler, \spath ensures the efficient processing of large-scale network data, accompanied by a detailed analysis of accuracy guarantees. Together, these contributions represent a significant advancement in the field of $k$-clique densest subgraph discovery. In experimental evaluations, our algorithms demonstrate orders of magnitude faster performance compared to the current state-of-the-art solutions.
