Graph-based Semi-supervised Local Clustering with Few Labeled Nodes
Zhaiming Shen, Ming-Jun Lai, Sheng Li
TL;DR
The paper tackles local clustering on graphs using only a few labeled nodes by reframing the task as sparse recovery on Laplacian-derived systems. It introduces CS-LCE, a semi-supervised method that constructs a full-graph initial cut and iteratively refines the target cluster via a removal set built from random-walk exploration, solved with Subspace Pursuit under sparsity constraints. The authors provide theoretical guarantees showing that, under mild perturbations and RIP-like conditions, the recovered cluster closely matches the true target cluster, and they validate the approach with extensive experiments across synthetic and real datasets where CS-LCE consistently outperforms baselines in accuracy and efficiency. The work offers a scalable, principled framework for extracting small, meaningful structures from large graphs with limited supervision, with potential extensions to incorporation into deep-learning pipelines.
Abstract
Local clustering aims at extracting a local structure inside a graph without the necessity of knowing the entire graph structure. As the local structure is usually small in size compared to the entire graph, one can think of it as a compressive sensing problem where the indices of target cluster can be thought as a sparse solution to a linear system. In this paper, we apply this idea based on two pioneering works under the same framework and propose a new semi-supervised local clustering approach using only few labeled nodes. Our approach improves the existing works by making the initial cut to be the entire graph and hence overcomes a major limitation of the existing works, which is the low quality of initial cut. Extensive experimental results on various datasets demonstrate the effectiveness of our approach.
