Robust Semi-Supervised Learning in Open Environments
Lan-Zhe Guo, Lin-Han Jia, Jie-Jing Shao, Yu-Feng Li
TL;DR
Open environments introduce inconsistencies in labels, features, and distributions between labeled and unlabeled data, challenging standard SSL assumptions. The paper formalizes this with a consistency degree $t\in[0,1]$, surveys strategies to address label, feature, and distribution inconsistencies, and presents benchmark datasets, evaluation metrics, and a public toolkit (LAMDA-SSL) to standardizerobust SSL assessment. Key approaches include detecting/removing irrelevant unseen-class unlabeled instances, feature-space reconciliation and robust regularization, and distribution-aware or bidirectional adaptation with calibrated pseudo-labels, all evaluated via the robustness curve based metric suite around $Acc(t)$. The work provides practical infrastructure for researchers to test robustness across modalities and highlights open problems spanning theory, data heterogeneity, and the integration with pre-trained models and decision-making tasks. Overall, this advances robust SSL research toward reliable performance in realistic, open-world settings.
Abstract
Semi-supervised learning (SSL) aims to improve performance by exploiting unlabeled data when labels are scarce. Conventional SSL studies typically assume close environments where important factors (e.g., label, feature, distribution) between labeled and unlabeled data are consistent. However, more practical tasks involve open environments where important factors between labeled and unlabeled data are inconsistent. It has been reported that exploiting inconsistent unlabeled data causes severe performance degradation, even worse than the simple supervised learning baseline. Manually verifying the quality of unlabeled data is not desirable, therefore, it is important to study robust SSL with inconsistent unlabeled data in open environments. This paper briefly introduces some advances in this line of research, focusing on techniques concerning label, feature, and data distribution inconsistency in SSL, and presents the evaluation benchmarks. Open research problems are also discussed for reference purposes.
