Table of Contents
Fetching ...

Asymmetric Loss For Multi-Label Classification

Emanuel Ben-Baruch, Tal Ridnik, Nadav Zamir, Asaf Noy, Itamar Friedman, Matan Protter, Lihi Zelnik-Manor

TL;DR

The paper tackles the pervasive negative–positive imbalance and label noise in multi-label image classification by introducing Asymmetric Loss (ASL), which decouples positive and negative sample contributions and adds probability shifting to hard-threshold easy negatives. It provides gradient and probability analyses and an adaptive scheme to adjust asymmetry during training, enabling robust learning of rare positives. Across MS-COCO, Pascal-VOC, NUS-WIDE, and Open Images, ASL delivers state-of-the-art mAP with standard architectures and training, and extends to single-label classification and object detection. The result is a simple, efficient loss that improves accuracy without altering model complexity or training time.

Abstract

In a typical multi-label setting, a picture contains on average few positive labels, and many negative ones. This positive-negative imbalance dominates the optimization process, and can lead to under-emphasizing gradients from positive labels during training, resulting in poor accuracy. In this paper, we introduce a novel asymmetric loss ("ASL"), which operates differently on positive and negative samples. The loss enables to dynamically down-weights and hard-thresholds easy negative samples, while also discarding possibly mislabeled samples. We demonstrate how ASL can balance the probabilities of different samples, and how this balancing is translated to better mAP scores. With ASL, we reach state-of-the-art results on multiple popular multi-label datasets: MS-COCO, Pascal-VOC, NUS-WIDE and Open Images. We also demonstrate ASL applicability for other tasks, such as single-label classification and object detection. ASL is effective, easy to implement, and does not increase the training time or complexity. Implementation is available at: https://github.com/Alibaba-MIIL/ASL.

Asymmetric Loss For Multi-Label Classification

TL;DR

The paper tackles the pervasive negative–positive imbalance and label noise in multi-label image classification by introducing Asymmetric Loss (ASL), which decouples positive and negative sample contributions and adds probability shifting to hard-threshold easy negatives. It provides gradient and probability analyses and an adaptive scheme to adjust asymmetry during training, enabling robust learning of rare positives. Across MS-COCO, Pascal-VOC, NUS-WIDE, and Open Images, ASL delivers state-of-the-art mAP with standard architectures and training, and extends to single-label classification and object detection. The result is a simple, efficient loss that improves accuracy without altering model complexity or training time.

Abstract

In a typical multi-label setting, a picture contains on average few positive labels, and many negative ones. This positive-negative imbalance dominates the optimization process, and can lead to under-emphasizing gradients from positive labels during training, resulting in poor accuracy. In this paper, we introduce a novel asymmetric loss ("ASL"), which operates differently on positive and negative samples. The loss enables to dynamically down-weights and hard-thresholds easy negative samples, while also discarding possibly mislabeled samples. We demonstrate how ASL can balance the probabilities of different samples, and how this balancing is translated to better mAP scores. With ASL, we reach state-of-the-art results on multiple popular multi-label datasets: MS-COCO, Pascal-VOC, NUS-WIDE and Open Images. We also demonstrate ASL applicability for other tasks, such as single-label classification and object detection. ASL is effective, easy to implement, and does not increase the training time or complexity. Implementation is available at: https://github.com/Alibaba-MIIL/ASL.

Paper Structure

This paper contains 27 sections, 11 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: (a) Real world challenges in multi-label classification. A typical image contains few positive samples, and many negative ones, leading to high negative-positive imbalance. Also, missing labels in ground-truth are common in multi-label datasets. (b) Proposed solution with ASL. The loss properties will be detailed in Section \ref{['sec:ASL_gradients']}
  • Figure 2: Loss Comparisons. Comparing probability-shifted focal loss to regular focal loss and cross-entropy, for negative samples. We used $\gamma_{\hbox{[}0.8]{$-$}}=2$ and $m=0.2$.
  • Figure 3: Gradient Analysis. Comparing the loss gradients vs. probability for different loss regimes. CE = Cross-Entropy ($m=\gamma_{\hbox{[}0.8]{$-$}}=0$), CE+PS = Cross-Entropy with Probability Shifting ($m>0,\gamma_{\hbox{[}0.8]{$-$}}=0$), AF = Asymmetric Focusing ($m=0,\gamma_{\hbox{[}0.8]{$-$}}>0$), ASL ($m>0,\gamma_{\hbox{[}0.8]{$-$}}>0$).
  • Figure 4: Probability analysis. The mean probability of positive and negative samples along the training with cross-entropy, focal loss and ASL, on MS-COCO. For focal loss we used $\gamma=2$. For ASL we used $\gamma_{+}=0$, $\gamma_{-}=2$, $m=0.2$.
  • Figure 5: mAP Vs. Focal Loss $\gamma$. Comparing MS-COCO mAP score for different values of focal loss $\gamma$.
  • ...and 4 more figures