Table of Contents
Fetching ...

Quasi-Clique Discovery via Energy Diffusion

Yu Zhang, Yilong Luo, Mingyuan Ma, Yao Chen, Enqiang Zhu, Jin Xu, Chanjuan Liu

TL;DR

Quasi-clique discovery in large graphs is NP-hard and prone to seed sensitivity. This work introduces EDQC, which combines an adaptive energy-diffusion phase that concentrates mass in cohesive regions with a conductance-guided extraction and refinement to guarantee a $\\\gamma$-quasi-clique. Empirical results on 75 real-world graphs show EDQC consistently yields larger quasi-cliques with low variance and competitive runtimes compared to state-of-the-art baselines, with statistical evidence of its superiority. The approach offers a robust, density-controlled alternative for dense subgraph discovery in large-scale networks and has potential applications in fraud detection, web spam, and recommendations.

Abstract

Discovering quasi-cliques -- subgraphs whose edge density exceeds a given threshold -- is a fundamental task in graph mining with applications to web spam detection, fraud screening, and e-commerce recommendation. However, existing methods for quasi-clique discovery on large-scale web graphs are often sensitive to random seeds or lack of explicit edge-density guarantees, making the task challenging in practice. This paper presents EDQC, an energy diffusion-based method for quasi-clique discovery. EDQC first employs an adaptive energy diffusion process to generate an energy ranking that highlights structurally cohesive regions. Guided by this energy ranking, the algorithm identifies a high-quality subgraph by minimizing conductance, a standard measure from community detection. This subgraph is then refined to meet the specified density threshold. Extensive experiments on 75 real-world graphs show that EDQC finds larger quasi-cliques on most datasets, with consistently lower variance across runs and competitive runtime. To the best of our knowledge, EDQC is the first method to incorporate energy diffusion into quasi-clique discovery.

Quasi-Clique Discovery via Energy Diffusion

TL;DR

Quasi-clique discovery in large graphs is NP-hard and prone to seed sensitivity. This work introduces EDQC, which combines an adaptive energy-diffusion phase that concentrates mass in cohesive regions with a conductance-guided extraction and refinement to guarantee a -quasi-clique. Empirical results on 75 real-world graphs show EDQC consistently yields larger quasi-cliques with low variance and competitive runtimes compared to state-of-the-art baselines, with statistical evidence of its superiority. The approach offers a robust, density-controlled alternative for dense subgraph discovery in large-scale networks and has potential applications in fraud detection, web spam, and recommendations.

Abstract

Discovering quasi-cliques -- subgraphs whose edge density exceeds a given threshold -- is a fundamental task in graph mining with applications to web spam detection, fraud screening, and e-commerce recommendation. However, existing methods for quasi-clique discovery on large-scale web graphs are often sensitive to random seeds or lack of explicit edge-density guarantees, making the task challenging in practice. This paper presents EDQC, an energy diffusion-based method for quasi-clique discovery. EDQC first employs an adaptive energy diffusion process to generate an energy ranking that highlights structurally cohesive regions. Guided by this energy ranking, the algorithm identifies a high-quality subgraph by minimizing conductance, a standard measure from community detection. This subgraph is then refined to meet the specified density threshold. Extensive experiments on 75 real-world graphs show that EDQC finds larger quasi-cliques on most datasets, with consistently lower variance across runs and competitive runtime. To the best of our knowledge, EDQC is the first method to incorporate energy diffusion into quasi-clique discovery.

Paper Structure

This paper contains 18 sections, 6 theorems, 6 equations, 7 figures, 3 tables, 3 algorithms.

Key Result

Proposition 1

The total energy is conserved in each round: $\sum_{x \in V} f'(x) = \sum_{x \in V} f(x)$.

Figures (7)

  • Figure 1: A clique and a 0.8-quasi-clique.
  • Figure 2: Runtime comparison of EDQC against baselines on a log-log scale.
  • Figure 3: Critical Difference diagrams for algorithm ranks on all 75 datasets (significance level 0.05). A horizontal bar connects algorithms with no statistically significant difference.
  • Figure 4: Subgraph density vs. total retained energy on eight representative datasets. Each blue dot represents one of 1,000 randomly sampled subgraphs of a fixed size; the red marks the quasi-clique of the same size found by EDQC.
  • Figure 5: Parameter sensitivity analysis for EDQC on the small representative graph (ego-facebook). Each of the three rows corresponds to a different density threshold $\gamma$. For each $\gamma$, the left plot shows the resulting quasi-clique size versus the activation threshold $\theta$ (on a log scale), and the right plot shows the corresponding runtime.
  • ...and 2 more figures

Theorems & Definitions (15)

  • Definition 1: $\gamma$-Quasi-Clique
  • Definition 2: Maximum Quasi-Clique Problem
  • Definition 3: Conductance
  • Proposition 1: Energy Conservation
  • proof
  • Proposition 2: Properties of $\alpha(u)$
  • proof
  • Proposition 3: Unbiased Distribution in Expectation
  • proof
  • Theorem 1: Complexity of Algorithm \ref{['alg:energy-diffusion']}
  • ...and 5 more