Table of Contents
Fetching ...

Efficient Defective Clique Enumeration and Search with Worst-Case Optimal Search Space

Jihoon Jang, Yehyun Nam, Kunsoo Park, Hyunjoon Kim

TL;DR

The paper tackles the NP-hard problem of enumerating maximal and searching for maximum k-defective cliques in graphs, motivated by noise and incomplete data in real networks. It introduces a clique-first branch-and-bound framework with a novel pivoting technique that achieves worst-case optimal search space bounds, supported by a diameter-two property-based decomposition to further shrink the search. The authors also develop practical enhancements, including an efficient initial-solution computation and graph-reduction techniques, collectively delivering up to four orders of magnitude speedups on large real-world graphs. Extensive experiments demonstrate both theoretical optimality and practical scalability, making the approach competitive for large-scale network analysis tasks such as link prediction and community detection.

Abstract

A $k$-defective clique is a relaxation of the traditional clique definition, allowing up to $k$ missing edges. This relaxation is crucial in various real-world applications such as link prediction, community detection, and social network analysis. Although the problems of enumerating maximal $k$-defective cliques and searching a maximum $k$-defective clique have been extensively studied, existing algorithms suffer from limitations such as the combinatorial explosion of small partial solutions and sub-optimal search spaces. To address these limitations, we propose a novel clique-first branch-and-bound framework that first generates cliques and then adds missing edges. Furthermore, we introduce a new pivoting technique that achieves a search space size of $\mathcal{O}(3^{\frac{n}{3}} \cdot n^k)$, where $n$ is the number of vertices in the input graph. We prove that the worst-case number of maximal $k$-defective cliques is $Ω(3^{\frac{n}{3}} \cdot n^k)$ when $k$ is a constant, establishing that our algorithm's search space is worst-case optimal. Leveraging the diameter-two property of defective cliques, we further reduce the search space size to $\mathcal{O}(n \cdot 3^{\fracδ{3}} \cdot (δΔ)^k)$, where $δ$ is the degeneracy and $Δ$ is the maximum degree of the input graph. We also propose an efficient framework for maximum $k$-defective clique search based on our branch-and-bound, together with practical techniques to reduce the search space. Experiments on real-world benchmark datasets with more than 1 million edges demonstrate that each of our proposed algorithms for maximal $k$-defective clique enumeration and maximum $k$-defective clique search outperforms the respective state-of-the-art algorithms by up to four orders of magnitude in terms of processing time.

Efficient Defective Clique Enumeration and Search with Worst-Case Optimal Search Space

TL;DR

The paper tackles the NP-hard problem of enumerating maximal and searching for maximum k-defective cliques in graphs, motivated by noise and incomplete data in real networks. It introduces a clique-first branch-and-bound framework with a novel pivoting technique that achieves worst-case optimal search space bounds, supported by a diameter-two property-based decomposition to further shrink the search. The authors also develop practical enhancements, including an efficient initial-solution computation and graph-reduction techniques, collectively delivering up to four orders of magnitude speedups on large real-world graphs. Extensive experiments demonstrate both theoretical optimality and practical scalability, making the approach competitive for large-scale network analysis tasks such as link prediction and community detection.

Abstract

A -defective clique is a relaxation of the traditional clique definition, allowing up to missing edges. This relaxation is crucial in various real-world applications such as link prediction, community detection, and social network analysis. Although the problems of enumerating maximal -defective cliques and searching a maximum -defective clique have been extensively studied, existing algorithms suffer from limitations such as the combinatorial explosion of small partial solutions and sub-optimal search spaces. To address these limitations, we propose a novel clique-first branch-and-bound framework that first generates cliques and then adds missing edges. Furthermore, we introduce a new pivoting technique that achieves a search space size of , where is the number of vertices in the input graph. We prove that the worst-case number of maximal -defective cliques is when is a constant, establishing that our algorithm's search space is worst-case optimal. Leveraging the diameter-two property of defective cliques, we further reduce the search space size to , where is the degeneracy and is the maximum degree of the input graph. We also propose an efficient framework for maximum -defective clique search based on our branch-and-bound, together with practical techniques to reduce the search space. Experiments on real-world benchmark datasets with more than 1 million edges demonstrate that each of our proposed algorithms for maximal -defective clique enumeration and maximum -defective clique search outperforms the respective state-of-the-art algorithms by up to four orders of magnitude in terms of processing time.

Paper Structure

This paper contains 52 sections, 9 theorems, 11 equations, 13 figures, 5 tables, 3 algorithms.

Key Result

Lemma 3.1

Given an instance $(S, C, X)$ of a recursive call where $N_C(S) \neq \emptyset$, let $p$ be a vertex in $N_C(S)$. Every maximal $k$-defective clique that includes $S$ and is included by $S \cup C$ must contain at least one vertex in $\overline{N}_C[p]$.

Figures (13)

  • Figure 1: A noisy protein-protein interaction (PPI) network bader2002analyzing containing two defective cliques as subgraphs (highlighted in blue). Missing interactions in the noisy network can be predicted by identifying defective cliques and completing the missing edges (represented by red dashed edges). After completing the missing edges, two protein complexes can be identified (left: Casein Kinase II, right: Exosome complex), each forming a clique.
  • Figure 2: An example graph $G$ and search trees to enumerate all maximal 1-defective cliques of size at least 4 in $G$. The label of each search tree node represents the branching vertex that is last added to the partial solution, and the vertices from the root to each node form the partial solution of that node. A node with a dashed outline indicates that adding the branching vertex introduces missing edges in its solution, and a gray-shaded node indicates that its partial solution is maximal in $G$.
  • Figure 3: The complement graph of the Moon-Moser graph with 12 vertices. The set $S$, consisting of six vertices and containing two missing edges (highlighted in red), is a maximal 2-defective clique in the Moon-Moser graph.
  • Figure 4: A diagram illustrating the definitions of $C_i$ and $B_i$ for $I = (S, C, X)$, where $S = \{u_1\}$, $C = \{u_2, \ldots, u_8\}$, and $\kappa = k = 1$. The vertex $u_2$ is selected as the pivot vertex, and $B$ is set to $\overline{N}_C[u_1] = \{u_2, u_3, u_4, u_6, u_7\}$.
  • Figure 5: An example graph and its coloring $\chi$ (best viewed in color) for applying our practical techniques.
  • ...and 8 more figures

Theorems & Definitions (19)

  • Definition 2.1: $k$-defective clique
  • Definition 2.2: Degeneracy ordering matula1983smallest
  • Definition 2.3: $s$-core seidman1983network
  • Example 3.1
  • Lemma 3.1: $k$-defective clique pivoting
  • Example 3.2
  • Theorem 3.1
  • Lemma 3.2
  • Theorem 3.2
  • Theorem 3.3
  • ...and 9 more