Probability-density-aware Semi-supervised Learning
Shuyang Liu, Ruiqiu Zheng, Yunhang Shen, Ke Li, Xing Sun, Zhou Yu, Shaohui Lin
TL;DR
This work addresses the gap in semi-supervised learning where cluster structure is underutilized by conventional similarity-based neighbor measures. It introduces a Probability-Density-Aware Measure (PM) that incorporates density information along paths between data points, and a Density-aware Label Propagation algorithm (PMLP) that uses PM to propagate high-confidence pseudo-labels more reliably. The authors provide theoretical results: a statistical explanation of the cluster assumption (density-path behavior) and a proof that PM-based LPA improves over traditional LPA, with pseudo-labeling shown as a special case of PMLP as density influence grows. Empirically, PMLP yields state-of-the-art or near-state-of-the-art performance on multiple benchmarks (SVHN, CIFAR10/100, STL-10), improves pseudo-label quality, and can be accelerated via GPU KDE, offering a practical, density-aware framework for SSL adoption in real-world settings.
Abstract
Semi-supervised learning (SSL) assumes that neighbor points lie in the same category (neighbor assumption), and points in different clusters belong to various categories (cluster assumption). Existing methods usually rely on similarity measures to retrieve the similar neighbor points, ignoring cluster assumption, which may not utilize unlabeled information sufficiently and effectively. This paper first provides a systematical investigation into the significant role of probability density in SSL and lays a solid theoretical foundation for cluster assumption. To this end, we introduce a Probability-Density-Aware Measure (PM) to discern the similarity between neighbor points. To further improve Label Propagation, we also design a Probability-Density-Aware Measure Label Propagation (PMLP) algorithm to fully consider the cluster assumption in label propagation. Last but not least, we prove that traditional pseudo-labeling could be viewed as a particular case of PMLP, which provides a comprehensive theoretical understanding of PMLP's superior performance. Extensive experiments demonstrate that PMLP achieves outstanding performance compared with other recent methods.
