Table of Contents
Fetching ...

Graph-based Semi-supervised Local Clustering with Few Labeled Nodes

Zhaiming Shen, Ming-Jun Lai, Sheng Li

TL;DR

The paper tackles local clustering on graphs using only a few labeled nodes by reframing the task as sparse recovery on Laplacian-derived systems. It introduces CS-LCE, a semi-supervised method that constructs a full-graph initial cut and iteratively refines the target cluster via a removal set built from random-walk exploration, solved with Subspace Pursuit under sparsity constraints. The authors provide theoretical guarantees showing that, under mild perturbations and RIP-like conditions, the recovered cluster closely matches the true target cluster, and they validate the approach with extensive experiments across synthetic and real datasets where CS-LCE consistently outperforms baselines in accuracy and efficiency. The work offers a scalable, principled framework for extracting small, meaningful structures from large graphs with limited supervision, with potential extensions to incorporation into deep-learning pipelines.

Abstract

Local clustering aims at extracting a local structure inside a graph without the necessity of knowing the entire graph structure. As the local structure is usually small in size compared to the entire graph, one can think of it as a compressive sensing problem where the indices of target cluster can be thought as a sparse solution to a linear system. In this paper, we apply this idea based on two pioneering works under the same framework and propose a new semi-supervised local clustering approach using only few labeled nodes. Our approach improves the existing works by making the initial cut to be the entire graph and hence overcomes a major limitation of the existing works, which is the low quality of initial cut. Extensive experimental results on various datasets demonstrate the effectiveness of our approach.

Graph-based Semi-supervised Local Clustering with Few Labeled Nodes

TL;DR

The paper tackles local clustering on graphs using only a few labeled nodes by reframing the task as sparse recovery on Laplacian-derived systems. It introduces CS-LCE, a semi-supervised method that constructs a full-graph initial cut and iteratively refines the target cluster via a removal set built from random-walk exploration, solved with Subspace Pursuit under sparsity constraints. The authors provide theoretical guarantees showing that, under mild perturbations and RIP-like conditions, the recovered cluster closely matches the true target cluster, and they validate the approach with extensive experiments across synthetic and real datasets where CS-LCE consistently outperforms baselines in accuracy and efficiency. The work offers a scalable, principled framework for extracting small, meaningful structures from large graphs with limited supervision, with potential extensions to incorporation into deep-learning pipelines.

Abstract

Local clustering aims at extracting a local structure inside a graph without the necessity of knowing the entire graph structure. As the local structure is usually small in size compared to the entire graph, one can think of it as a compressive sensing problem where the indices of target cluster can be thought as a sparse solution to a linear system. In this paper, we apply this idea based on two pioneering works under the same framework and propose a new semi-supervised local clustering approach using only few labeled nodes. Our approach improves the existing works by making the initial cut to be the entire graph and hence overcomes a major limitation of the existing works, which is the low quality of initial cut. Extensive experimental results on various datasets demonstrate the effectiveness of our approach.
Paper Structure (21 sections, 5 theorems, 15 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 5 theorems, 15 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Lemma 1

Let G be an undirected graph with non-negative weights. The multiplicity $k$ of the eigenvalue zero of $L$ equals to the number of connected components $C_1, C_2, \cdots, C_k$ in $G$, and the indicator vectors $\textbf{1}_{C_1}, \cdots, \textbf{1}_{C_k}\in\mathbb{R}^n$ on these components span the k

Figures (5)

  • Figure 1: Performances on Symmetric Stochastic Block Model. Top: Average Jaccard Index. Bottom: Logarithm of Average Run Time.
  • Figure 2: Performances on Non-symmetric Stochastic Block Model. Top: Average Jaccard Index. Bottom: Logarithm of Average Run Time.
  • Figure 3: Visualizations of Geometric Data. From Left to Right: Three Lines, Three Circles, and Three Moons.
  • Figure 4: Left: Randomly Permuted AT&T Faces. Right: Desired Recovery of all Clusters.
  • Figure 5: Average Jaccard Index on OptDigits.

Theorems & Definitions (14)

  • Definition 1
  • Lemma 1
  • Remark 1
  • Definition 2
  • Remark 2
  • Remark 3
  • Theorem 1
  • proof
  • Theorem 2
  • proof
  • ...and 4 more