Table of Contents
Fetching ...

A Deep Model for Partial Multi-Label Image Classification with Curriculum Based Disambiguation

Feng Sun, Ming-Kun Xie, Sheng-Jun Huang

TL;DR

A novel curriculum-based disambiguation strategy to progressively identify ground-truth labels by incorporating the varied difficulties of different classes is proposed and consistency regularization is introduced for model training to balance fitting identified easy labels and exploiting potential relevant labels.

Abstract

In this paper, we study the partial multi-label (PML) image classification problem, where each image is annotated with a candidate label set consists of multiple relevant labels and other noisy labels. Existing PML methods typically design a disambiguation strategy to filter out noisy labels by utilizing prior knowledge with extra assumptions, which unfortunately is unavailable in many real tasks. Furthermore, because the objective function for disambiguation is usually elaborately designed on the whole training set, it can be hardly optimized in a deep model with SGD on mini-batches. In this paper, for the first time we propose a deep model for PML to enhance the representation and discrimination ability. On one hand, we propose a novel curriculum based disambiguation strategy to progressively identify ground-truth labels by incorporating the varied difficulties of different classes. On the other hand, a consistency regularization is introduced for model retraining to balance fitting identified easy labels and exploiting potential relevant labels. Extensive experimental results on the commonly used benchmark datasets show the proposed method significantly outperforms the SOTA methods.

A Deep Model for Partial Multi-Label Image Classification with Curriculum Based Disambiguation

TL;DR

A novel curriculum-based disambiguation strategy to progressively identify ground-truth labels by incorporating the varied difficulties of different classes is proposed and consistency regularization is introduced for model training to balance fitting identified easy labels and exploiting potential relevant labels.

Abstract

In this paper, we study the partial multi-label (PML) image classification problem, where each image is annotated with a candidate label set consists of multiple relevant labels and other noisy labels. Existing PML methods typically design a disambiguation strategy to filter out noisy labels by utilizing prior knowledge with extra assumptions, which unfortunately is unavailable in many real tasks. Furthermore, because the objective function for disambiguation is usually elaborately designed on the whole training set, it can be hardly optimized in a deep model with SGD on mini-batches. In this paper, for the first time we propose a deep model for PML to enhance the representation and discrimination ability. On one hand, we propose a novel curriculum based disambiguation strategy to progressively identify ground-truth labels by incorporating the varied difficulties of different classes. On the other hand, a consistency regularization is introduced for model retraining to balance fitting identified easy labels and exploiting potential relevant labels. Extensive experimental results on the commonly used benchmark datasets show the proposed method significantly outperforms the SOTA methods.
Paper Structure (14 sections, 8 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 14 sections, 8 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example of partial multi-label image classification. The image is partial-labeled by annotators with different level of expertise on the crowdsourcing platform. In the candidate label set, grass, leopard and deer are ground-truth labels while lion, tiger, antelope and flower are noisy labels.
  • Figure 2: The illustration of our CDCR framework. CDCR strategy progressively identifies true labels and sets their weights as 1. The consistency regularization is used to retrain the model by conducting a stochastic augmentation for each training image.
  • Figure 3: Performance of different disambiguation strategies on VOC with flipping rate $q=0.4$. Experimental results show that CDCR achieves much better disambiguation performance than CD due to its strong robustness based on consistency regularization against noisy labels in the identified labels.
  • Figure 4: Corresponding to one baseline and three strategies, i.e. BCE, curriculum based disambiguation, consistency regularization and class difficulty, we show the per-class mAP increments between each two strategies to evaluate our method. The class index is sorted from left to right based on the noise rate.
  • Figure 5: Accuracy comparisons with different values of $\alpha$ on VOC2007 and COCO