Table of Contents
Fetching ...

AGGRNet: Selective Feature Extraction and Aggregation for Enhanced Medical Image Classification

Ansh Makwe, Akansh Agrawal, Prateek Jain, Akshan Agrawal, Priyanka Bagade

TL;DR

AGGRNet targets the challenge of fine-grained medical image classification under inter-class similarity and intra-class variability by introducing a dedicated Feature Extraction and Aggregation (FEA) module that separates informative from non-informative features and fuses global context via cross-attention. The FEM and FAM components, aided by an adaptive threshold and a cross-stage partial channel attention (C2PCA) block, enable selective feature emphasis and robust global reasoning when integrated into a CNN backbone (YOLOv11) for medical imaging tasks. Across LIMUC, ISIC2018, Kvasir, PathMNIST, and RetinaMNIST, AGGRNet achieves state-of-the-art performance, including up to +5% accuracy gains on Kvasir and notable improvements on other datasets, validated through extensive ablations showing the effectiveness of each module. The work demonstrates practical impact for improving disease-subtype classification and severity grading, potentially reducing subjective bias in clinical decision-making by providing more reliable and interpretable feature representations.

Abstract

Medical image analysis for complex tasks such as severity grading and disease subtype classification poses significant challenges due to intricate and similar visual patterns among classes, scarcity of labeled data, and variability in expert interpretations. Despite the usefulness of existing attention-based models in capturing complex visual patterns for medical image classification, underlying architectures often face challenges in effectively distinguishing subtle classes since they struggle to capture inter-class similarity and intra-class variability, resulting in incorrect diagnosis. To address this, we propose AGGRNet framework to extract informative and non-informative features to effectively understand fine-grained visual patterns and improve classification for complex medical image analysis tasks. Experimental results show that our model achieves state-of-the-art performance on various medical imaging datasets, with the best improvement up to 5% over SOTA models on the Kvasir dataset.

AGGRNet: Selective Feature Extraction and Aggregation for Enhanced Medical Image Classification

TL;DR

AGGRNet targets the challenge of fine-grained medical image classification under inter-class similarity and intra-class variability by introducing a dedicated Feature Extraction and Aggregation (FEA) module that separates informative from non-informative features and fuses global context via cross-attention. The FEM and FAM components, aided by an adaptive threshold and a cross-stage partial channel attention (C2PCA) block, enable selective feature emphasis and robust global reasoning when integrated into a CNN backbone (YOLOv11) for medical imaging tasks. Across LIMUC, ISIC2018, Kvasir, PathMNIST, and RetinaMNIST, AGGRNet achieves state-of-the-art performance, including up to +5% accuracy gains on Kvasir and notable improvements on other datasets, validated through extensive ablations showing the effectiveness of each module. The work demonstrates practical impact for improving disease-subtype classification and severity grading, potentially reducing subjective bias in clinical decision-making by providing more reliable and interpretable feature representations.

Abstract

Medical image analysis for complex tasks such as severity grading and disease subtype classification poses significant challenges due to intricate and similar visual patterns among classes, scarcity of labeled data, and variability in expert interpretations. Despite the usefulness of existing attention-based models in capturing complex visual patterns for medical image classification, underlying architectures often face challenges in effectively distinguishing subtle classes since they struggle to capture inter-class similarity and intra-class variability, resulting in incorrect diagnosis. To address this, we propose AGGRNet framework to extract informative and non-informative features to effectively understand fine-grained visual patterns and improve classification for complex medical image analysis tasks. Experimental results show that our model achieves state-of-the-art performance on various medical imaging datasets, with the best improvement up to 5% over SOTA models on the Kvasir dataset.

Paper Structure

This paper contains 20 sections, 10 equations, 6 figures, 4 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of proposed AGGRNet framework with state-of-the-art HiFuse huo2024hifuse architecture, showing improved identification of critical regions (Grad-CAM visual results).
  • Figure 2: The proposed Feature Extraction and Aggregation (FEA) Module
  • Figure 3: Cross Stage Partial Channel Attention Block (C2PCA)
  • Figure 4: The proposed architecture AGGRNet with the novel FEA module and C2PCA block.
  • Figure 5: Class-wise Confidence Score (Predicted Probability) Comparison on LIMUC dataset between state-of-the-art CDW-CE (Inception V3) model and the proposed AGGRNet Framework
  • ...and 1 more figures