Table of Contents
Fetching ...

Unlocking the Power of Open Set : A New Perspective for Open-Set Noisy Label Learning

Wenhai Wan, Xinrui Wang, Ming-Kun Xie, Shao-Yuan Li, Sheng-Jun Huang, Songcan Chen

TL;DR

This work tackles Open-Set Noisy Label Learning (OSNLL) by revealing the Class Expansion phenomenon, where some open-set examples become integrated into known classes and can actually aid learning. It introduces CECL, a two-step framework that first identifies clean vs. noisy data and then performs prototype-guided contrastive learning to incorporate select open-set samples into closed-set classes while using others as delimiters to sharpen class boundaries. A theoretical result under $(oldsymbol{\sigma},oldsymbol{\delta})$-Augmentation shows that including distinguishable open-set examples can tighten inter-class separation, corroborated by extensive experiments on CIFAR-based and real-world noisy datasets where CECL consistently outperforms strong baselines. The approach yields improved representation quality and discrimination in noisy settings, suggesting a practical paradigm for leveraging open-set information in robust visual classification.

Abstract

Learning from noisy data has attracted much attention, where most methods focus on closed-set label noise. However, a more common scenario in the real world is the presence of both open-set and closed-set noise. Existing methods typically identify and handle these two types of label noise separately by designing a specific strategy for each type. However, in many real-world scenarios, it would be challenging to identify open-set examples, especially when the dataset has been severely corrupted. Unlike the previous works, we explore how models behave when faced with open-set examples, and find that \emph{a part of open-set examples gradually get integrated into certain known classes}, which is beneficial for the separation among known classes. Motivated by the phenomenon, we propose a novel two-step contrastive learning method CECL (Class Expansion Contrastive Learning) which aims to deal with both types of label noise by exploiting the useful information of open-set examples. Specifically, we incorporate some open-set examples into closed-set classes to enhance performance while treating others as delimiters to improve representative ability. Extensive experiments on synthetic and real-world datasets with diverse label noise demonstrate the effectiveness of CECL.

Unlocking the Power of Open Set : A New Perspective for Open-Set Noisy Label Learning

TL;DR

This work tackles Open-Set Noisy Label Learning (OSNLL) by revealing the Class Expansion phenomenon, where some open-set examples become integrated into known classes and can actually aid learning. It introduces CECL, a two-step framework that first identifies clean vs. noisy data and then performs prototype-guided contrastive learning to incorporate select open-set samples into closed-set classes while using others as delimiters to sharpen class boundaries. A theoretical result under -Augmentation shows that including distinguishable open-set examples can tighten inter-class separation, corroborated by extensive experiments on CIFAR-based and real-world noisy datasets where CECL consistently outperforms strong baselines. The approach yields improved representation quality and discrimination in noisy settings, suggesting a practical paradigm for leveraging open-set information in robust visual classification.

Abstract

Learning from noisy data has attracted much attention, where most methods focus on closed-set label noise. However, a more common scenario in the real world is the presence of both open-set and closed-set noise. Existing methods typically identify and handle these two types of label noise separately by designing a specific strategy for each type. However, in many real-world scenarios, it would be challenging to identify open-set examples, especially when the dataset has been severely corrupted. Unlike the previous works, we explore how models behave when faced with open-set examples, and find that \emph{a part of open-set examples gradually get integrated into certain known classes}, which is beneficial for the separation among known classes. Motivated by the phenomenon, we propose a novel two-step contrastive learning method CECL (Class Expansion Contrastive Learning) which aims to deal with both types of label noise by exploiting the useful information of open-set examples. Specifically, we incorporate some open-set examples into closed-set classes to enhance performance while treating others as delimiters to improve representative ability. Extensive experiments on synthetic and real-world datasets with diverse label noise demonstrate the effectiveness of CECL.
Paper Structure (13 sections, 9 equations, 7 figures, 3 tables)

This paper contains 13 sections, 9 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: An example of open-set noisy label learning problem. $\emph{\{cat, dog, elephant\}}$ is the concerned known classes. The left, middle, and right columns respectively show images that are correctly labeled, wrongly labeled with closed-set and open-set noise.
  • Figure 2: The distribution of open-set examples among different known classes on CIFAR10 and MNIST, respectively.
  • Figure 3: Experimental results of incorporating examples from open-set classes into known classes for learning on CIFAR10 and Tiny-Imagenet. 'CS', 'CS and OS' respectively denote training only on closed-set class examples and a mixture of closed-set and open-set class examples.
  • Figure 4: The intuition of CECL. CECL incorporates certain indistinct open-set examples into the known classes, which are expected to contribute to class expansion with better generalization. Additionally, the distinguishable open-set examples are used as delimiters, which are expected to push away between the known classes with better discrimination.
  • Figure 5: Illustration of CECL. According to the information obtained in the first step, clean examples are used to generate prototypes for each class, certain open-set examples are incorporated into known classes in the form of class expansion, and remaining are perceived as delimiters. The momentum embeddings are maintained by a queue structure. '//' means stop gradient.
  • ...and 2 more figures