Table of Contents
Fetching ...

Novel Pooling-based VGG-Lite for Pneumonia and Covid-19 Detection from Imbalanced Chest X-Ray Datasets

Santanu Roy, Ashvath Suresh, Palak Sahu, Tulika Rudra Gupta

TL;DR

The paper tackles severe class imbalance in chest X-ray detection of pneumonia and Covid-19 by introducing a lightweight VGG-Lite backbone augmented with a Complementary Edge Enhanced Module (CEEM) and a novel 2Max-Min pooling strategy. CEEM uses a negative image pathway and an edge-focused pooling operation to emphasize distinctive edge features, functioning as a spatial attention mechanism that improves minority-class performance while keeping model complexity low. On two imbalanced CXR datasets, the proposed VGG-Lite + CEEM framework outperforms contemporary CNNs and Vision Transformers, delivering high macro accuracy and AUC with substantially fewer parameters and faster convergence. The work demonstrates stable performance under 5-fold cross-validation and outlines future directions toward a universal Pneumonia-Net and broader, noisier clinical data handling, highlighting practical impact for lightweight, robust clinical CAD systems.

Abstract

This paper proposes a novel pooling-based VGG-Lite model in order to mitigate class imbalance issues in Chest X-Ray (CXR) datasets. Automatic Pneumonia detection from CXR images by deep learning model has emerged as a prominent and dynamic area of research, since the inception of the new Covid-19 variant in 2020. However, the standard Convolutional Neural Network (CNN) models encounter challenges associated with class imbalance, a prevalent issue found in many medical datasets. The innovations introduced in the proposed model architecture include: (I) A very lightweight CNN model, `VGG-Lite', is proposed as a base model, inspired by VGG-16 and MobileNet-V2 architecture. (II) On top of this base model, we leverage an ``Edge Enhanced Module (EEM)" through a parallel branch, consisting of a ``negative image layer", and a novel custom pooling layer ``2Max-Min Pooling". This 2Max-Min Pooling layer is entirely novel in this investigation, providing more attention to edge components within pneumonia CXR images. Thus, it works as an efficient spatial attention module (SAM). We have implemented the proposed framework on two separate CXR datasets. The first dataset is obtained from a readily available source on the internet, and the second dataset is a more challenging CXR dataset, assembled by our research team from three different sources. Experimental results reveal that our proposed framework has outperformed pre-trained CNN models, and three recent trend existing models ``Vision Transformer", ``Pooling-based Vision Transformer (PiT)'' and ``PneuNet", by substantial margins on both datasets. The proposed framework VGG-Lite with EEM, has achieved a macro average of 95% accuracy, 97.1% precision, 96.1% recall, and 96.6% F1 score on the ``Pneumonia Imbalance CXR dataset", without employing any pre-processing technique.

Novel Pooling-based VGG-Lite for Pneumonia and Covid-19 Detection from Imbalanced Chest X-Ray Datasets

TL;DR

The paper tackles severe class imbalance in chest X-ray detection of pneumonia and Covid-19 by introducing a lightweight VGG-Lite backbone augmented with a Complementary Edge Enhanced Module (CEEM) and a novel 2Max-Min pooling strategy. CEEM uses a negative image pathway and an edge-focused pooling operation to emphasize distinctive edge features, functioning as a spatial attention mechanism that improves minority-class performance while keeping model complexity low. On two imbalanced CXR datasets, the proposed VGG-Lite + CEEM framework outperforms contemporary CNNs and Vision Transformers, delivering high macro accuracy and AUC with substantially fewer parameters and faster convergence. The work demonstrates stable performance under 5-fold cross-validation and outlines future directions toward a universal Pneumonia-Net and broader, noisier clinical data handling, highlighting practical impact for lightweight, robust clinical CAD systems.

Abstract

This paper proposes a novel pooling-based VGG-Lite model in order to mitigate class imbalance issues in Chest X-Ray (CXR) datasets. Automatic Pneumonia detection from CXR images by deep learning model has emerged as a prominent and dynamic area of research, since the inception of the new Covid-19 variant in 2020. However, the standard Convolutional Neural Network (CNN) models encounter challenges associated with class imbalance, a prevalent issue found in many medical datasets. The innovations introduced in the proposed model architecture include: (I) A very lightweight CNN model, `VGG-Lite', is proposed as a base model, inspired by VGG-16 and MobileNet-V2 architecture. (II) On top of this base model, we leverage an ``Edge Enhanced Module (EEM)" through a parallel branch, consisting of a ``negative image layer", and a novel custom pooling layer ``2Max-Min Pooling". This 2Max-Min Pooling layer is entirely novel in this investigation, providing more attention to edge components within pneumonia CXR images. Thus, it works as an efficient spatial attention module (SAM). We have implemented the proposed framework on two separate CXR datasets. The first dataset is obtained from a readily available source on the internet, and the second dataset is a more challenging CXR dataset, assembled by our research team from three different sources. Experimental results reveal that our proposed framework has outperformed pre-trained CNN models, and three recent trend existing models ``Vision Transformer", ``Pooling-based Vision Transformer (PiT)'' and ``PneuNet", by substantial margins on both datasets. The proposed framework VGG-Lite with EEM, has achieved a macro average of 95% accuracy, 97.1% precision, 96.1% recall, and 96.6% F1 score on the ``Pneumonia Imbalance CXR dataset", without employing any pre-processing technique.

Paper Structure

This paper contains 13 sections, 16 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: First image represents various classes of "Imbalanced Pneumonia" dataset along with its box plot of correlation co-efficient, $2^{nd}$ image shows the class imbalance due to number of images differ per class
  • Figure 2: (a) represents block diagram of proposed framework: "VGG-Lite"$+$ Complementary Edge Enhanced Module (CEEM). How i/p features are transformed into -Ve (complementary) and edge enhanced features by -Ve layer and 2Max-Min Pooling respectively, is demonstrated in the CEEM block. Furthermore, we present more examples of -Ve$+$2Max-Min pooled images in (b): The $1^{st}$ column represents original images, $2^{nd}$ and $3^{rd}$ column present 2max-Min pooled images and -Ve$+$2Max-Min Pooled images, respectively. For better visualization, zooming is preferable.
  • Figure 3: From left to right: Training graphs of accuracy vs epochs, validation graph of accuracy vs epochs, training graph of loss vs epochs for several models, on "Pneumonia Imbalance Dataset". (The validation loss vs. epochs graph is omitted here due to considerable fluctuations and limited space in the paper).
  • Figure 4: From left to right: Confusion matrices of Vision Transformer (ViT), Proposed VGG-Lite without attention, and Proposed VGG-Lite$+$CEEM, on "Pneumonia Imbalance Dataset"
  • Figure 5: From left to right: RoC graph (True positive rate vs False positive rate) of the PneuNet model, ViT model, proposed model without attention, proposed model with attention respectively, on "Pneumonia imbalance dataset". Zooming is preferable.