Pyramid Pixel Context Adaption Network for Medical Image Classification with Supervised Contrastive Learning
Xiaoqing Zhang, Zunjie Xiao, Xiao Wu, Yanlin Chen, Jilu Zhao, Yan Hu, Jiang Liu
TL;DR
This work addresses the limited effectiveness of long-range self-attention for detecting subtle lesions in medical images by introducing PPCA, a lightweight module that fuses multi-scale pixel context at a per-pixel level. PPCA consists of Cross-Channel Pyramid Pooling, Pixel Normalization, and Pixel Context Adaption, enabling per-pixel attention with negligible overhead, and is integrated into the PPCANet architecture. The approach is augmented with supervised contrastive loss to improve representation learning, yielding consistent improvements over SOTA attention methods and various deep networks across six medical datasets; visual analyses further illuminate how PPCA concentrates on informative lesion regions. The method offers practical efficiency and interpretability, with promising potential for broader medical image analysis and future extensions to 3D data and other vision tasks.
Abstract
Spatial attention mechanism has been widely incorporated into deep neural networks (DNNs), significantly lifting the performance in computer vision tasks via long-range dependency modeling. However, it may perform poorly in medical image analysis. Unfortunately, existing efforts are often unaware that long-range dependency modeling has limitations in highlighting subtle lesion regions. To overcome this limitation, we propose a practical yet lightweight architectural unit, Pyramid Pixel Context Adaption (PPCA) module, which exploits multi-scale pixel context information to recalibrate pixel position in a pixel-independent manner dynamically. PPCA first applies a well-designed cross-channel pyramid pooling to aggregate multi-scale pixel context information, then eliminates the inconsistency among them by the well-designed pixel normalization, and finally estimates per pixel attention weight via a pixel context integration. By embedding PPCA into a DNN with negligible overhead, the PPCANet is developed for medical image classification. In addition, we introduce supervised contrastive learning to enhance feature representation by exploiting the potential of label information via supervised contrastive loss. The extensive experiments on six medical image datasets show that PPCANet outperforms state-of-the-art attention-based networks and recent deep neural networks. We also provide visual analysis and ablation study to explain the behavior of PPCANet in the decision-making process.
