Table of Contents
Fetching ...

OwMatch: Conditional Self-Labeling with Consistency for Open-World Semi-Supervised Learning

Shengjie Niu, Lifan Lin, Jian Huang, Chao Wang

TL;DR

This study revisits two methodologies from self-supervised and semi-supervised learning, self-labeling and consistency, tailoring them to address the OwSSL problem, and proposes an effective framework called OwMatch, combining conditional self-labeling and open-world hierarchical thresholding.

Abstract

Semi-supervised learning (SSL) offers a robust framework for harnessing the potential of unannotated data. Traditionally, SSL mandates that all classes possess labeled instances. However, the emergence of open-world SSL (OwSSL) introduces a more practical challenge, wherein unlabeled data may encompass samples from unseen classes. This scenario leads to misclassification of unseen classes as known ones, consequently undermining classification accuracy. To overcome this challenge, this study revisits two methodologies from self-supervised and semi-supervised learning, self-labeling and consistency, tailoring them to address the OwSSL problem. Specifically, we propose an effective framework called OwMatch, combining conditional self-labeling and open-world hierarchical thresholding. Theoretically, we analyze the estimation of class distribution on unlabeled data through rigorous statistical analysis, thus demonstrating that OwMatch can ensure the unbiasedness of the self-label assignment estimator with reliability. Comprehensive empirical analyses demonstrate that our method yields substantial performance enhancements across both known and unknown classes in comparison to previous studies. Code is available at https://github.com/niusj03/OwMatch.

OwMatch: Conditional Self-Labeling with Consistency for Open-World Semi-Supervised Learning

TL;DR

This study revisits two methodologies from self-supervised and semi-supervised learning, self-labeling and consistency, tailoring them to address the OwSSL problem, and proposes an effective framework called OwMatch, combining conditional self-labeling and open-world hierarchical thresholding.

Abstract

Semi-supervised learning (SSL) offers a robust framework for harnessing the potential of unannotated data. Traditionally, SSL mandates that all classes possess labeled instances. However, the emergence of open-world SSL (OwSSL) introduces a more practical challenge, wherein unlabeled data may encompass samples from unseen classes. This scenario leads to misclassification of unseen classes as known ones, consequently undermining classification accuracy. To overcome this challenge, this study revisits two methodologies from self-supervised and semi-supervised learning, self-labeling and consistency, tailoring them to address the OwSSL problem. Specifically, we propose an effective framework called OwMatch, combining conditional self-labeling and open-world hierarchical thresholding. Theoretically, we analyze the estimation of class distribution on unlabeled data through rigorous statistical analysis, thus demonstrating that OwMatch can ensure the unbiasedness of the self-label assignment estimator with reliability. Comprehensive empirical analyses demonstrate that our method yields substantial performance enhancements across both known and unknown classes in comparison to previous studies. Code is available at https://github.com/niusj03/OwMatch.

Paper Structure

This paper contains 50 sections, 5 theorems, 33 equations, 7 figures, 16 tables.

Key Result

Lemma 4.2

Suppose we want to test the null hypothesis ($H_0$) that categorical data $N_1, N_2,\cdots, N_{\mathcal{C}}$ come from a multinomial distribution with $K$ classes and class probability of $\boldsymbol{\mathcal{P}}$. A chi-square statistic can be constructed to test the deviation between the observat where $\mathbb{E}_{\boldsymbol{\mathcal{P}}}[\cdot]$ denotes the population expectation of random v

Figures (7)

  • Figure 1: Experimental results on the OwSSL problem. (a) Self-label assignment of seen classes (1-5) and novel classes (6-10) with or without conditional component in self-labeling. (b) Predictive confidence and hierarchical threshold for each class.
  • Figure 2: Overview of the OwMatch framework.
  • Figure 3: Illustration on the hierarchical thresholding scheme.
  • Figure 4: Confusion matrices on CIFAR-10 with both novel class ratio and label ratio of 50%. The model needs to classify the initial five seen classes accurately (as reflected in the diagonal elements). While for the novel classes (6-10), the classes clustering are required to align with the ground-truth label (dark blue in one cell).
  • Figure 5: Accuracy as a function of class number estimation error on CIFAR-100 dataset.
  • ...and 2 more figures

Theorems & Definitions (11)

  • Lemma 4.2
  • Definition 4.3: Expectation of chi-square statistics (ECS)
  • Theorem 4.4
  • Theorem 4.5
  • Lemma E.1
  • Lemma E.2
  • proof
  • proof
  • proof
  • proof
  • ...and 1 more