Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

Xiaoqing Zhang; Qiushi Nie; Zunjie Xiao; Jilu Zhao; Xiao Wu; Pengxin Guo; Runzhi Li; Jin Liu; Yanjie Wei; Yi Pan

Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

Xiaoqing Zhang, Qiushi Nie, Zunjie Xiao, Jilu Zhao, Xiao Wu, Pengxin Guo, Runzhi Li, Jin Liu, Yanjie Wei, Yi Pan

TL;DR

This work addresses confidence calibration and accuracy gaps in medical image classification arising from SP and CCP limitations. It introduces a dual-view framework and a multi-scale dual-view pooling method (DVPP) that jointly harnesses spatial and pixel-wise features, with five parameter-free implementations. Across six diverse 2D/3D medical datasets and multiple backbones, DVPP consistently improves classification metrics and calibration measures, highlighting its generalization and practical impact. Visual analyses and ablations elucidate how multi-scale dual-view features underpin these gains, and the authors outline future work to extend DVPP to segmentation/detection and to deepen theoretical understanding.

Abstract

Spatial pooling (SP) and cross-channel pooling (CCP) operators have been applied to aggregate spatial features and pixel-wise features from feature maps in deep neural networks (DNNs), respectively. Their main goal is to reduce computation and memory overhead without visibly weakening the performance of DNNs. However, SP often faces the problem of losing the subtle feature representations, while CCP has a high possibility of ignoring salient feature representations, which may lead to both miscalibration of confidence issues and suboptimal medical classification results. To address these problems, we propose a novel dual-view framework, the first to systematically investigate the relative roles of SP and CCP by analyzing the difference between spatial features and pixel-wise features. Based on this framework, we propose a new pooling method, termed dual-view pyramid pooling (DVPP), to aggregate multi-scale dual-view features. DVPP aims to boost both medical image classification and confidence calibration performance by fully leveraging the merits of SP and CCP operators from a dual-axis perspective. Additionally, we discuss how to fulfill DVPP with five parameter-free implementations. Extensive experiments on six 2D/3D medical image classification tasks show that our DVPP surpasses state-of-the-art pooling methods in terms of medical image classification results and confidence calibration across different DNNs.

Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

TL;DR

Abstract

Paper Structure (28 sections, 8 equations, 7 figures, 9 tables)

This paper contains 28 sections, 8 equations, 7 figures, 9 tables.

Introduction
Related Works
Pooling
DNN Calibration
Methodology
Rethinking Spatial Pooling and Cross-Channel Pooling with Dual-View Framework
Dual-view Pyramid Pooling
General Form
Parameter-free DVPP Implementation
Experiments
Datasets
Experimental Settings
Evaluation Metrics
Baselines
Ablation Study
...and 13 more sections

Figures (7)

Figure 1: Given a fundus image as the input, we apply GAP and CAP operators to aggregate global spatial average features and pixel-wise average features from the high-level feature maps of the last convolutional layer, including salient and subtle feature representations respectively. Here, we take the pre-trained ResNet50 as the backbone architecture and tackle the diabetic retinopathy (DR) grading task on the APTOS2019 dataset.
Figure 2: The simple implementations of GAP (a) and CAP (b) operators.
Figure 3: An representative implementation of our proposed dual-view pyramid pooling (DVPP). Here, we take the SC-DVPP-C-Ser as the example to illustrate DVPP. Deep neural networks (DNNs) first take 2D/3D medial images as inputs and generate high-level feature maps. Next, DVPP performs the dual-view pyramid pooling operations to aggregate multi-scale dual-view features from these feature maps, which are fed into the classifier directly. Finally, the classifier generates precise and reliable predicted results.
Figure 4: Five representative parameter-free DVPP implementations: SC-DVPP-Ser, SC-DVPP-S-Ser, SC-DVPP-C-Ser, SC-DVPP-Par, and Twins-DVPP.
Figure 5: The multi-scale dual-view feature maps and multi-scale dual-view feature statistics of SC-DVPP in SC-DVPP-C. The datasets are ISIC2018 and BTM.
...and 2 more figures

Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

TL;DR

Abstract

Dual-View Pyramid Pooling in Deep Neural Networks for Improved Medical Image Classification and Confidence Calibration

Authors

TL;DR

Abstract

Table of Contents

Figures (7)