Table of Contents
Fetching ...

Scalable $k$-clique Densest Subgraph Search

Xiaowei Ye, Miao Qiao, Rong-Hua Li, Qi Zhang, Guoren Wang

TL;DR

This work tackles the scalable solution of the $k$-clique densest subgraph search ($k$-$\mathsf{DSS}$) by introducing a SCT-based convex programming formulation ${\mathsf{SCT\text{-}CP}}(G)$ and solving it with a Frank–Wolfe algorithm ${\mathsf{PSCTL}}$, achieving a runtime independent of the number of $k$-cliques $|\mathcal{C}_k(V)|$. To further scale to massive graphs, it introduces a polynomial-time sampling method ${\mathsf{CPSample}}$ that uses ${\mathsf{CCPATH}}$ to sample $k$-cliques uniformly and estimates density with provable accuracy guarantees. Theoretical results link ${\mathsf{SCT\text{-}CP}}(G)$ to near-optimal $k$-DSS solutions, providing bounds such as $\rho_k(H(r^*)) \ge (1 - 1/(k|V^*|)) \rho_k(V^*)$, and the sampling method comes with Chernoff-based guarantees ensuring reliable approximations under mild conditions. Experiments on 12 large real-world graphs show orders-of-magnitude speedups over state-of-the-art methods, with ${\mathsf{CPSample}}$ solving networks with up to $1.8\times 10^9$ edges and providing competitive accuracy, while ${\mathsf{PSCTL}}$ remains highly efficient and memory-friendly due to the SCT-based framework. The authors also release open-source code, enabling reproducibility and practical adoption in large-scale network analysis.

Abstract

In this paper, we present a collection of novel and scalable algorithms designed to tackle the challenges inherent in the $k$-clique densest subgraph problem (\kcdsp) within network analysis. We propose \psctl, a novel algorithm based on the Frank-Wolfe approach for addressing \kcdsp, effectively solving a distinct convex programming problem. \textcolor{black}{\psctl is able to approximate \kcdsp with near optimal guarantees.} The notable advantage of \psctl lies in its time complexity, which is independent of the count of $k$-cliques, resulting in remarkable efficiency in practical applications. Additionally, we present \spath, a sampling-based algorithm with the capability to handle networks on an unprecedented scale, reaching up to $1.8\times 10^9$ edges. By leveraging the \ccpath algorithm as a uniform $k$-clique sampler, \spath ensures the efficient processing of large-scale network data, accompanied by a detailed analysis of accuracy guarantees. Together, these contributions represent a significant advancement in the field of $k$-clique densest subgraph discovery. In experimental evaluations, our algorithms demonstrate orders of magnitude faster performance compared to the current state-of-the-art solutions.

Scalable $k$-clique Densest Subgraph Search

TL;DR

This work tackles the scalable solution of the -clique densest subgraph search (-) by introducing a SCT-based convex programming formulation and solving it with a Frank–Wolfe algorithm , achieving a runtime independent of the number of -cliques . To further scale to massive graphs, it introduces a polynomial-time sampling method that uses to sample -cliques uniformly and estimates density with provable accuracy guarantees. Theoretical results link to near-optimal -DSS solutions, providing bounds such as , and the sampling method comes with Chernoff-based guarantees ensuring reliable approximations under mild conditions. Experiments on 12 large real-world graphs show orders-of-magnitude speedups over state-of-the-art methods, with solving networks with up to edges and providing competitive accuracy, while remains highly efficient and memory-friendly due to the SCT-based framework. The authors also release open-source code, enabling reproducibility and practical adoption in large-scale network analysis.

Abstract

In this paper, we present a collection of novel and scalable algorithms designed to tackle the challenges inherent in the -clique densest subgraph problem (\kcdsp) within network analysis. We propose \psctl, a novel algorithm based on the Frank-Wolfe approach for addressing \kcdsp, effectively solving a distinct convex programming problem. \textcolor{black}{\psctl is able to approximate \kcdsp with near optimal guarantees.} The notable advantage of \psctl lies in its time complexity, which is independent of the count of -cliques, resulting in remarkable efficiency in practical applications. Additionally, we present \spath, a sampling-based algorithm with the capability to handle networks on an unprecedented scale, reaching up to edges. By leveraging the \ccpath algorithm as a uniform -clique sampler, \spath ensures the efficient processing of large-scale network data, accompanied by a detailed analysis of accuracy guarantees. Together, these contributions represent a significant advancement in the field of -clique densest subgraph discovery. In experimental evaluations, our algorithms demonstrate orders of magnitude faster performance compared to the current state-of-the-art solutions.
Paper Structure (14 sections, 23 theorems, 4 equations, 8 figures, 6 tables, 4 algorithms)

This paper contains 14 sections, 23 theorems, 4 equations, 8 figures, 6 tables, 4 algorithms.

Key Result

lemma 1

Consider a $k$-clique $C$ and let $x$ be the node in $C$ with the smallest ranking and $y$ the node with the largest ranking. Either $r(y) = r(x)$ or $\alpha_y^C = 0$.

Figures (8)

  • Figure 1: The three-step paradigm for $k$-$\mathsf{DSS}$.
  • Figure 2: Illustration of the SCT.
  • Figure 3: Illustration of $\mathsf{PSCTL}$ on the example graph for one iteration.
  • Figure 4: Running time of the Frank-Wolfe based algorithms ($T=10$).
  • Figure 5: Running time of different Frank-Wolfe based algorithms with varying $T$.
  • ...and 3 more figures

Theorems & Definitions (25)

  • definition 1: $k$-clique Densest Subgraph
  • lemma 1: kclpp
  • lemma 2: kclppDanisch17
  • lemma 3: PIVOTER
  • lemma 4
  • lemma 5: PIVOTER
  • lemma 6
  • theorem 1
  • definition 2
  • theorem 2
  • ...and 15 more