Table of Contents
Fetching ...

Pyramid Pixel Context Adaption Network for Medical Image Classification with Supervised Contrastive Learning

Xiaoqing Zhang, Zunjie Xiao, Xiao Wu, Yanlin Chen, Jilu Zhao, Yan Hu, Jiang Liu

TL;DR

This work addresses the limited effectiveness of long-range self-attention for detecting subtle lesions in medical images by introducing PPCA, a lightweight module that fuses multi-scale pixel context at a per-pixel level. PPCA consists of Cross-Channel Pyramid Pooling, Pixel Normalization, and Pixel Context Adaption, enabling per-pixel attention with negligible overhead, and is integrated into the PPCANet architecture. The approach is augmented with supervised contrastive loss to improve representation learning, yielding consistent improvements over SOTA attention methods and various deep networks across six medical datasets; visual analyses further illuminate how PPCA concentrates on informative lesion regions. The method offers practical efficiency and interpretability, with promising potential for broader medical image analysis and future extensions to 3D data and other vision tasks.

Abstract

Spatial attention mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, Pyramid Pixel Context Adaption (PPCA) module, which exploits multi-scale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling to aggregate multi-scale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization, and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCANet is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss. The extensive experiments on six medical image datasets show that PPCANet outperforms state-of-the-art attention-based networks and recent deep neural networks. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.

Pyramid Pixel Context Adaption Network for Medical Image Classification with Supervised Contrastive Learning

TL;DR

This work addresses the limited effectiveness of long-range self-attention for detecting subtle lesions in medical images by introducing PPCA, a lightweight module that fuses multi-scale pixel context at a per-pixel level. PPCA consists of Cross-Channel Pyramid Pooling, Pixel Normalization, and Pixel Context Adaption, enabling per-pixel attention with negligible overhead, and is integrated into the PPCANet architecture. The approach is augmented with supervised contrastive loss to improve representation learning, yielding consistent improvements over SOTA attention methods and various deep networks across six medical datasets; visual analyses further illuminate how PPCA concentrates on informative lesion regions. The method offers practical efficiency and interpretability, with promising potential for broader medical image analysis and future extensions to 3D data and other vision tasks.

Abstract

Spatial attention mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, Pyramid Pixel Context Adaption (PPCA) module, which exploits multi-scale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling to aggregate multi-scale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization, and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCANet is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss. The extensive experiments on six medical image datasets show that PPCANet outperforms state-of-the-art attention-based networks and recent deep neural networks. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.
Paper Structure (32 sections, 10 equations, 9 figures, 13 tables)

This paper contains 32 sections, 10 equations, 9 figures, 13 tables.

Figures (9)

  • Figure 1: (a) Object region in a natural image, which is salient through observing pixel value distribution difference between object region and other regions. (b) The subtle lesion region of myopia on the fundus image. We also present a pixel value distribution comparison between a subtle lesion region and a redundant region.
  • Figure 2: Pixel attention weight maps generated by NL wang2018non, GC cao2019gcnet, EA guo2022beyond, and PPCA at the high stage of ResNet18 for skin disease, blinding disease, and retinal disease based on three medical image modalities: dermatoscopic image, fundus image, and optical coherence tomography (OCT) image. Clearly, our method is more capable of emphasizing subtle lesion regions accurately (red box) than state-of-the-art spatial attention methods.
  • Figure 3: The detailed construction of pyramid pixel context recalibration (PPCA) module. Given the intermediate feature maps $X \in R^{C \times H \times W}$, PPCA generates the pixel attention weight map $G\in R^{1 \times H \times W}$.
  • Figure 4: Pyramid pixel context adaption network (PPCANet) for medical image classification (a), in which we combine the PPCA module with the residual module to construct a Residual-PPCA module (b). Furthermore, to achieve better performance, we adopt supervised contrastive loss as the supplement for cross-entropy loss for further exploiting label information (c).
  • Figure 5: Training (left) and validation (right) curves on ISIC2018 dataset with ResNet18 (baseline) and different spatial attention methods.
  • ...and 4 more figures