Table of Contents
Fetching ...

CAVE-Net: Classifying Abnormalities in Video Capsule Endoscopy

Ishita Harish, Saurav Mishra, Neha Bhadoria, Rithik Kumar, Madhav Arora, Syed Rameem Zahra, Ankur Gupta

TL;DR

CAVE-Net addresses multiclass abnormalities in video capsule endoscopy by combining a CBAM-enhanced ResNet feature extractor with a latent-space-driven DNN and a Syn-XRF ensemble of SVM, RF, KNN, and XGBoost. The latent features come from a ResNet50-based autoencoder trained with reconstruction loss, while data augmentation mitigates severe class imbalance. The three parallel models are fused via soft voting to form a robust classifier that outperforms individual baselines and achieves competitive results in a global Capsule Vision 2024 challenge ranking. The approach demonstrates improved accuracy and robustness for automated GI lesion detection, with potential to aid clinical workflows in endoscopy.

Abstract

Accurate classification of medical images is critical for detecting abnormalities in the gastrointestinal tract, a domain where misclassification can significantly impact patient outcomes. We propose an ensemble-based approach to improve diagnostic accuracy in analyzing complex image datasets. Using a Convolutional Block Attention Module along with a Deep Neural Network, we leverage the unique feature extraction capabilities of each model to enhance the overall accuracy. The classification models, such as Random Forest, XGBoost, Support Vector Machine and K-Nearest Neighbors are introduced to further diversify the predictive power of proposed ensemble. By using these methods, the proposed framework, CAVE-Net, provides robust feature discrimination and improved classification results. Experimental evaluations demonstrate that the CAVE-Net achieves high accuracy and robustness across challenging and imbalanced classes, showing significant promise for broader applications in computer vision tasks.

CAVE-Net: Classifying Abnormalities in Video Capsule Endoscopy

TL;DR

CAVE-Net addresses multiclass abnormalities in video capsule endoscopy by combining a CBAM-enhanced ResNet feature extractor with a latent-space-driven DNN and a Syn-XRF ensemble of SVM, RF, KNN, and XGBoost. The latent features come from a ResNet50-based autoencoder trained with reconstruction loss, while data augmentation mitigates severe class imbalance. The three parallel models are fused via soft voting to form a robust classifier that outperforms individual baselines and achieves competitive results in a global Capsule Vision 2024 challenge ranking. The approach demonstrates improved accuracy and robustness for automated GI lesion detection, with potential to aid clinical workflows in endoscopy.

Abstract

Accurate classification of medical images is critical for detecting abnormalities in the gastrointestinal tract, a domain where misclassification can significantly impact patient outcomes. We propose an ensemble-based approach to improve diagnostic accuracy in analyzing complex image datasets. Using a Convolutional Block Attention Module along with a Deep Neural Network, we leverage the unique feature extraction capabilities of each model to enhance the overall accuracy. The classification models, such as Random Forest, XGBoost, Support Vector Machine and K-Nearest Neighbors are introduced to further diversify the predictive power of proposed ensemble. By using these methods, the proposed framework, CAVE-Net, provides robust feature discrimination and improved classification results. Experimental evaluations demonstrate that the CAVE-Net achieves high accuracy and robustness across challenging and imbalanced classes, showing significant promise for broader applications in computer vision tasks.

Paper Structure

This paper contains 16 sections, 15 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: Representative images from each class in the dataset.
  • Figure 2: Angioectasia
  • Figure 3: Bleeding
  • Figure 4: Erosion
  • Figure 5: Erythema
  • ...and 8 more figures