Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference

Seongheon Park; Hyuk Kwon; Kwanghoon Sohn; Kibok Lee

Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference

Seongheon Park, Hyuk Kwon, Kwanghoon Sohn, Kibok Lee

TL;DR

The paper addresses practical open-world semi-supervised learning under long-tailed distributions and potential class-prior mismatch (ROWSSL). It introduces density-based temperature scaling (DTS) and soft pseudo-labeling, augmented by tailedness prototypes to estimate local density and tailness in the representation space, enabling dynamic balancing of head and tail classes. The method jointly learns representations and a prototypical classifier with self-supervised and supervised objectives, plus a density-driven pseudo-labeling mechanism that accounts for class uncertainty. The authors demonstrate gains over state-of-the-art OWSSL methods on CIFAR-100-LT and ImageNet-100-LT, in both inductive and transductive settings, with thorough ablations and qualitative evidence (e.g., t-SNE visualizations) supporting more discriminative, balanced representations and better novel-tail class recognition. The work provides a practical, scalable framework for ROWSSL, addressing real-world data shifts and deployment constraints while advancing open-world category discovery and classification under MNAR conditions.

Abstract

Open-world semi-supervised learning (OWSSL) extends conventional semi-supervised learning to open-world scenarios by taking account of novel categories in unlabeled datasets. Despite the recent advancements in OWSSL, the success often relies on the assumptions that 1) labeled and unlabeled datasets share the same balanced class prior distribution, which does not generally hold in real-world applications, and 2) unlabeled training datasets are utilized for evaluation, where such transductive inference might not adequately address challenges in the wild. In this paper, we aim to generalize OWSSL by addressing them. Our work suggests that practical OWSSL may require different training settings, evaluation methods, and learning strategies compared to those prevalent in the existing literature.

Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference

TL;DR

Abstract

Paper Structure (41 sections, 11 equations, 6 figures, 16 tables)

This paper contains 41 sections, 11 equations, 6 figures, 16 tables.

Introduction
ROWSSL
Training setting
Evaluation setting and metrics
Transductive inference.
Transductive inference on the test set.
Inductive inference.
Proposed Method
Training objectives
Representation learning.
Classifier learning.
Constructing tailedness prototypes
Tailedness estimation.
Prototype update.
Density-based learning strategy
...and 26 more sections

Figures (6)

Figure 1: Examples of scenarios considered in ROWSSL.
Figure C.1: (a) Overall framework of the proposed DTS. "\\ \\" stands for stop gradient. (b) Example of tailedness estimation.
Figure D.1: Comparison of tail discovery methods.
Figure E.1: Analysis of hyperparameters.
Figure H.1: t-SNE visualization on the test set of CIFAR-100-LT.
...and 1 more figures

Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference

TL;DR

Abstract

Rethinking Open-World Semi-Supervised Learning: Distribution Mismatch and Inductive Inference

Authors

TL;DR

Abstract

Table of Contents

Figures (6)