Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference
Seongheon Park, Hyuk Kwon, Kwanghoon Sohn, Kibok Lee
TL;DR
The paper addresses practical open-world semi-supervised learning under long-tailed distributions and potential class-prior mismatch (ROWSSL). It introduces density-based temperature scaling (DTS) and soft pseudo-labeling, augmented by tailedness prototypes to estimate local density and tailness in the representation space, enabling dynamic balancing of head and tail classes. The method jointly learns representations and a prototypical classifier with self-supervised and supervised objectives, plus a density-driven pseudo-labeling mechanism that accounts for class uncertainty. The authors demonstrate gains over state-of-the-art OWSSL methods on CIFAR-100-LT and ImageNet-100-LT, in both inductive and transductive settings, with thorough ablations and qualitative evidence (e.g., t-SNE visualizations) supporting more discriminative, balanced representations and better novel-tail class recognition. The work provides a practical, scalable framework for ROWSSL, addressing real-world data shifts and deployment constraints while advancing open-world category discovery and classification under MNAR conditions.
Abstract
Open-world semi-supervised learning (OWSSL) extends conventional semi-supervised learning to open-world scenarios by taking account of novel categories in unlabeled datasets. Despite the recent advancements in OWSSL, the success often relies on the assumptions that 1) labeled and unlabeled datasets share the same balanced class prior distribution, which does not generally hold in real-world applications, and 2) unlabeled training datasets are utilized for evaluation, where such transductive inference might not adequately address challenges in the wild. In this paper, we aim to generalize OWSSL by addressing them. Our work suggests that practical OWSSL may require different training settings, evaluation methods, and learning strategies compared to those prevalent in the existing literature.
