Graph-based Semi-supervised Local Clustering with Few Labeled Nodes

Zhaiming Shen; Ming-Jun Lai; Sheng Li

Graph-based Semi-supervised Local Clustering with Few Labeled Nodes

Zhaiming Shen, Ming-Jun Lai, Sheng Li

TL;DR

The paper tackles local clustering on graphs using only a few labeled nodes by reframing the task as sparse recovery on Laplacian-derived systems. It introduces CS-LCE, a semi-supervised method that constructs a full-graph initial cut and iteratively refines the target cluster via a removal set built from random-walk exploration, solved with Subspace Pursuit under sparsity constraints. The authors provide theoretical guarantees showing that, under mild perturbations and RIP-like conditions, the recovered cluster closely matches the true target cluster, and they validate the approach with extensive experiments across synthetic and real datasets where CS-LCE consistently outperforms baselines in accuracy and efficiency. The work offers a scalable, principled framework for extracting small, meaningful structures from large graphs with limited supervision, with potential extensions to incorporation into deep-learning pipelines.

Abstract

Local clustering aims at extracting a local structure inside a graph without the necessity of knowing the entire graph structure. As the local structure is usually small in size compared to the entire graph, one can think of it as a compressive sensing problem where the indices of target cluster can be thought as a sparse solution to a linear system. In this paper, we apply this idea based on two pioneering works under the same framework and propose a new semi-supervised local clustering approach using only few labeled nodes. Our approach improves the existing works by making the initial cut to be the entire graph and hence overcomes a major limitation of the existing works, which is the low quality of initial cut. Extensive experimental results on various datasets demonstrate the effectiveness of our approach.

Graph-based Semi-supervised Local Clustering with Few Labeled Nodes

TL;DR

Abstract

Paper Structure (21 sections, 5 theorems, 15 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 21 sections, 5 theorems, 15 equations, 5 figures, 6 tables, 1 algorithm.

Introduction
Preliminaries
Graph Notations and Concepts
Compressive Sensing
Problem Statement
Local Clustering via Compressive Sensing
Main Algorithm
Theoretical Analysis
Experiments
Datasets.
Baselines and Settings.
Simulated Data
Symmetric Stochastic Block Model.
Non-symmetric Stochastic Block Model.
Geometric Data.
...and 6 more sections

Key Result

Lemma 1

Let G be an undirected graph with non-negative weights. The multiplicity $k$ of the eigenvalue zero of $L$ equals to the number of connected components $C_1, C_2, \cdots, C_k$ in $G$, and the indicator vectors $\textbf{1}_{C_1}, \cdots, \textbf{1}_{C_k}\in\mathbb{R}^n$ on these components span the k

Figures (5)

Figure 1: Performances on Symmetric Stochastic Block Model. Top: Average Jaccard Index. Bottom: Logarithm of Average Run Time.
Figure 2: Performances on Non-symmetric Stochastic Block Model. Top: Average Jaccard Index. Bottom: Logarithm of Average Run Time.
Figure 3: Visualizations of Geometric Data. From Left to Right: Three Lines, Three Circles, and Three Moons.
Figure 4: Left: Randomly Permuted AT&T Faces. Right: Desired Recovery of all Clusters.
Figure 5: Average Jaccard Index on OptDigits.

Theorems & Definitions (14)

Definition 1
Lemma 1
Remark 1
Definition 2
Remark 2
Remark 3
Theorem 1
proof
Theorem 2
proof
...and 4 more

Graph-based Semi-supervised Local Clustering with Few Labeled Nodes

TL;DR

Abstract

Graph-based Semi-supervised Local Clustering with Few Labeled Nodes

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (14)