Saliency-guided and Patch-based Mixup for Long-tailed Skin Cancer Image Classification

Tianyunxi Wei; Yijin Huang; Li Lin; Pujin Cheng; Sirui Li; Xiaoying Tang

Saliency-guided and Patch-based Mixup for Long-tailed Skin Cancer Image Classification

Tianyunxi Wei, Yijin Huang, Li Lin, Pujin Cheng, Sirui Li, Xiaoying Tang

TL;DR

This work addresses long-tailed skin cancer image classification by introducing SPMix, a saliency-guided and patch-based mixup framework. It blends tail-class samples with head-class backgrounds at the feature level, guided by lesion saliency maps, and uses per-patch mixup ratios with $r = \min(\alpha, \max(s_h, s_t))$ along with patch-wise ratios $r_i = \text{avg}(s_i)$, followed by transformer-based representation learning and a supervised contrastive loss. The key contributions are the saliency-guided mixup mechanism, lesion-aware per-patch mixing, and an integrated SCL framework that yields improved tail performance while preserving head-class accuracy, demonstrated on the ISIC2018 dataset with significant gains over prior methods. The approach has practical impact for medical image analysis where data imbalance is common and lesion-focused diagnostics are critical, offering a robust augmentation strategy that preserves diagnostic features in tail classes.

Abstract

Medical image datasets often exhibit long-tailed distributions due to the inherent challenges in medical data collection and annotation. In long-tailed contexts, some common disease categories account for most of the data, while only a few samples are available in the rare disease categories, resulting in poor performance of deep learning methods. To address this issue, previous approaches have employed class re-sampling or re-weighting techniques, which often encounter challenges such as overfitting to tail classes or difficulties in optimization during training. In this work, we propose a novel approach, namely \textbf{S}aliency-guided and \textbf{P}atch-based \textbf{Mix}up (SPMix) for long-tailed skin cancer image classification. Specifically, given a tail-class image and a head-class image, we generate a new tail-class image by mixing them under the guidance of saliency mapping, which allows for preserving and augmenting the discriminative features of the tail classes without any interference of the head-class features. Extensive experiments are conducted on the ISIC2018 dataset, demonstrating the superiority of SPMix over existing state-of-the-art methods.

Saliency-guided and Patch-based Mixup for Long-tailed Skin Cancer Image Classification

TL;DR

along with patch-wise ratios

, followed by transformer-based representation learning and a supervised contrastive loss. The key contributions are the saliency-guided mixup mechanism, lesion-aware per-patch mixing, and an integrated SCL framework that yields improved tail performance while preserving head-class accuracy, demonstrated on the ISIC2018 dataset with significant gains over prior methods. The approach has practical impact for medical image analysis where data imbalance is common and lesion-focused diagnostics are critical, offering a robust augmentation strategy that preserves diagnostic features in tail classes.

Abstract

Paper Structure (14 sections, 2 equations, 4 figures, 2 tables)

This paper contains 14 sections, 2 equations, 4 figures, 2 tables.

Introduction
METHODS
Overall Framework
Saliency Guidance
Lesion-aware Mixup Ratio
Patch-based Mixup
Supervised Contrastive Loss
EXPERIMENTS
Dataset and Evaluation
Implementation Details
Comparison with State-of-the-art
Ablation Study
CONCLUSION
Acknowledgements

Figures (4)

Figure 1: A comparison of different mixup methods. We visualize SPMix on the image level. As presented, SPMix ensures to preserve the discriminative features of tail-class samples, while other methods may compromise these features or exhibit inadequate levels of mixup.
Figure 2: The overall SPMix framework. First, the tail-class image and head-class image are augmented differently to generate $(x_{t1}, x_{t2}), (x_{h1}, x_{h2})$. The pairs are respectively fed into a query and key encoder, which have the same architecture and the key encoder is driven by a momentum update with the query encode. The saliency map of the pairs are merged and patchified. The average value of each patch is used as the patch-based mixup ratio. The mixed features serve as the new positive pair and are utilized for the supervised contrastive loss.
Figure 3: Long-tailed distributions of the ISIC2018.
Figure 4: Visualization of augmented images generated by SPMix on the image level. The column (a) and (d) are the tail-class images. The column (b) and (e) are the head-class images. The column (c) and (f) are the augmented images.

Saliency-guided and Patch-based Mixup for Long-tailed Skin Cancer Image Classification

TL;DR

Abstract

Saliency-guided and Patch-based Mixup for Long-tailed Skin Cancer Image Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (4)