Table of Contents
Fetching ...

DIST: Efficient k-Clique Listing via Induced Subgraph Trie

Yehyun Nam, Jihoon Jang, Kunsoo Park, Jianye Yang, Cheng Long

TL;DR

This work tackles the challenging problem of listing all $k$-cliques in large graphs. It introduces the Induced Subgraph Trie to memoize and efficiently retrieve cliques, coupled with a pruning mechanism based on soft embeddings of $l$-trees and a density-aware ListingDense routine for dense subgraphs. Empirical results on 16 real networks show DIST substantially outperforms state-of-the-art methods in both running time and memory usage, including notable gains on graphs with large maximum clique sizes and in parallel execution. The approach enables scalable, exact enumeration of $k$-cliques and offers a practical memory management strategy, suggesting broad applicability to cohesive subgraph mining tasks.

Abstract

Listing k-cliques plays a fundamental role in various data mining tasks, such as community detection and mining of cohesive substructures. Existing algorithms for the k-clique listing problem are built upon a general framework, which finds k-cliques by recursively finding (k-1)-cliques within subgraphs induced by the out-neighbors of each vertex. However, this framework has inherent inefficiency of finding smaller cliques within certain subgraphs repeatedly. In this paper, we propose an algorithm DIST for the k-clique listing problem. In contrast to existing works, the main idea in our approach is to compute each clique in the given graph only once and store it into a data structure called Induced Subgraph Trie, which allows us to retrieve the cliques efficiently. Furthermore, we propose a method to prune search space based on a novel concept called soft embedding of an l-tree, which further improves the running time. We show the superiority of our approach in terms of time and space usage through comprehensive experiments conducted on real-world networks; DIST outperforms the state-of-the-art algorithm by up to two orders of magnitude in both single-threaded and parallel experiments.

DIST: Efficient k-Clique Listing via Induced Subgraph Trie

TL;DR

This work tackles the challenging problem of listing all -cliques in large graphs. It introduces the Induced Subgraph Trie to memoize and efficiently retrieve cliques, coupled with a pruning mechanism based on soft embeddings of -trees and a density-aware ListingDense routine for dense subgraphs. Empirical results on 16 real networks show DIST substantially outperforms state-of-the-art methods in both running time and memory usage, including notable gains on graphs with large maximum clique sizes and in parallel execution. The approach enables scalable, exact enumeration of -cliques and offers a practical memory management strategy, suggesting broad applicability to cohesive subgraph mining tasks.

Abstract

Listing k-cliques plays a fundamental role in various data mining tasks, such as community detection and mining of cohesive substructures. Existing algorithms for the k-clique listing problem are built upon a general framework, which finds k-cliques by recursively finding (k-1)-cliques within subgraphs induced by the out-neighbors of each vertex. However, this framework has inherent inefficiency of finding smaller cliques within certain subgraphs repeatedly. In this paper, we propose an algorithm DIST for the k-clique listing problem. In contrast to existing works, the main idea in our approach is to compute each clique in the given graph only once and store it into a data structure called Induced Subgraph Trie, which allows us to retrieve the cliques efficiently. Furthermore, we propose a method to prune search space based on a novel concept called soft embedding of an l-tree, which further improves the running time. We show the superiority of our approach in terms of time and space usage through comprehensive experiments conducted on real-world networks; DIST outperforms the state-of-the-art algorithm by up to two orders of magnitude in both single-threaded and parallel experiments.

Paper Structure

This paper contains 14 sections, 3 theorems, 2 equations, 11 figures, 3 tables, 5 algorithms.

Key Result

Theorem 1

Algorithm 2 lists all $k$-cliques in $\vec{G}$.

Figures (11)

  • Figure 1: Two $k$-clique communities at $k\space=\space4$. A $k$-clique community is a union of $k$-cliques adjacent to each other, where adjacency means sharing $k\space-\space1$ vertices.
  • Figure 2: Graph $G$ and DAG $\vec{G}$ based on degeneracy ordering.
  • Figure 3: Induced Subgraph Trie for the DAG $\vec{G}$ in Figure \ref{['fig:DAG']}. Red arc (resp. blue arc) labeled $l$ outgoing from a node $t$ is the $l$-child link (resp. $l$-sibling link) of $t$.
  • Figure 4: DAG $\vec{G}$, $4$-tree, and two soft embeddings $f$ and $g$.
  • Figure 5: Running time of algorithms on small-$\omega$ graphs.
  • ...and 6 more figures

Theorems & Definitions (19)

  • Definition 1: $k$-clique listing
  • Example 1
  • Definition 2
  • Example 2
  • Example 3
  • Definition 3
  • Definition 4
  • Example 4
  • Definition 5
  • Theorem 1
  • ...and 9 more