Table of Contents
Fetching ...

SMCL: Saliency Masked Contrastive Learning for Long-tailed Recognition

Sanglee Park, Seung-won Hwang, Jungmin So

TL;DR

The paper addresses long-tailed recognition where biased background features cause predictions to favor major classes. It introduces Saliency Masked Contrastive Learning (SMCL), which masks salient image regions and uses a minor-class biased sampling alongside a mixed loss that combines $L_{MCE}$ and $L_{MSC}$ to pull masked backgrounds toward minor classes in feature space. Empirical results on CIFAR-10-LT, CIFAR-100-LT, and ImageNet-LT demonstrate competitive or state-of-the-art performance, with ablations confirming the effectiveness of saliency masking and the contrastive objective. The approach is simple to implement and enhances generalization by mitigating background-feature bias, offering tangible benefits for real-world long-tailed recognition tasks.

Abstract

Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked contrastive learning, a new method that uses saliency masking and contrastive learning to mitigate the problem and improve the generalizability of a model. Our key idea is to mask the important part of an image using saliency detection and use contrastive learning to move the masked image towards minor classes in the feature space, so that background features present in the masked image are no longer correlated with the original class. Experiment results show that our method achieves state-of-the-art level performance on benchmark long-tailed datasets.

SMCL: Saliency Masked Contrastive Learning for Long-tailed Recognition

TL;DR

The paper addresses long-tailed recognition where biased background features cause predictions to favor major classes. It introduces Saliency Masked Contrastive Learning (SMCL), which masks salient image regions and uses a minor-class biased sampling alongside a mixed loss that combines and to pull masked backgrounds toward minor classes in feature space. Empirical results on CIFAR-10-LT, CIFAR-100-LT, and ImageNet-LT demonstrate competitive or state-of-the-art performance, with ablations confirming the effectiveness of saliency masking and the contrastive objective. The approach is simple to implement and enhances generalization by mitigating background-feature bias, offering tangible benefits for real-world long-tailed recognition tasks.

Abstract

Real-world data often follow a long-tailed distribution with a high imbalance in the number of samples between classes. The problem with training from imbalanced data is that some background features, common to all classes, can be unobserved in classes with scarce samples. As a result, this background correlates to biased predictions into ``major" classes. In this paper, we propose saliency masked contrastive learning, a new method that uses saliency masking and contrastive learning to mitigate the problem and improve the generalizability of a model. Our key idea is to mask the important part of an image using saliency detection and use contrastive learning to move the masked image towards minor classes in the feature space, so that background features present in the masked image are no longer correlated with the original class. Experiment results show that our method achieves state-of-the-art level performance on benchmark long-tailed datasets.
Paper Structure (20 sections, 8 equations, 2 figures, 4 tables)

This paper contains 20 sections, 8 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: (a): The image and its CAM of a major class sample (bird). (b): The image and its CAMs of a minor class sample (horse), misclassified as "bird" by the classifier. The middle image is the CAM for the predicted label (bird), and the right image is the CAM for the true label (horse). (c): Illustration of saliency masked contrastive learning.
  • Figure 2: Overview of the proposed framework.