Table of Contents
Fetching ...

Enhanced Multi-Class Classification of Gastrointestinal Endoscopic Images with Interpretable Deep Learning Model

Astitva Kamble, Vani Bandodkar, Saakshi Dharmadhikary, Veena Anand, Pradyut Kumar Sanki, Mei X. Wu, Biswabandhu Jana

TL;DR

This study addresses GI endoscopy image classification by developing an augmentation-free, EfficientNetB3-based network trained on the Kvasir dataset with eight classes, achieving 94.25% test accuracy. It combines strong discriminative performance with interpretability through LIME saliency maps, enabling clinicians to see which image regions drive predictions. The approach balances accuracy, computational efficiency, and transparency, making it suitable for resource-limited clinical settings. Overall, it contributes a compact, interpretable pipeline for GI endoscopy image classification with robust generalization to unseen data.

Abstract

Endoscopy serves as an essential procedure for evaluating the gastrointestinal (GI) tract and plays a pivotal role in identifying GI-related disorders. Recent advancements in deep learning have demonstrated substantial progress in detecting abnormalities through intricate models and data augmentation methods.This research introduces a novel approach to enhance classification accuracy using 8,000 labeled endoscopic images from the Kvasir dataset, categorized into eight distinct classes. Leveraging EfficientNetB3 as the backbone, the proposed architecture eliminates reliance on data augmentation while preserving moderate model complexity. The model achieves a test accuracy of 94.25%, alongside precision and recall of 94.29% and 94.24% respectively. Furthermore, Local Interpretable Model-agnostic Explanation (LIME) saliency maps are employed to enhance interpretability by defining critical regions in the images that influenced model predictions. Overall, this work highlights the importance of AI in advancing medical imaging by combining high classification accuracy with interpretability.

Enhanced Multi-Class Classification of Gastrointestinal Endoscopic Images with Interpretable Deep Learning Model

TL;DR

This study addresses GI endoscopy image classification by developing an augmentation-free, EfficientNetB3-based network trained on the Kvasir dataset with eight classes, achieving 94.25% test accuracy. It combines strong discriminative performance with interpretability through LIME saliency maps, enabling clinicians to see which image regions drive predictions. The approach balances accuracy, computational efficiency, and transparency, making it suitable for resource-limited clinical settings. Overall, it contributes a compact, interpretable pipeline for GI endoscopy image classification with robust generalization to unseen data.

Abstract

Endoscopy serves as an essential procedure for evaluating the gastrointestinal (GI) tract and plays a pivotal role in identifying GI-related disorders. Recent advancements in deep learning have demonstrated substantial progress in detecting abnormalities through intricate models and data augmentation methods.This research introduces a novel approach to enhance classification accuracy using 8,000 labeled endoscopic images from the Kvasir dataset, categorized into eight distinct classes. Leveraging EfficientNetB3 as the backbone, the proposed architecture eliminates reliance on data augmentation while preserving moderate model complexity. The model achieves a test accuracy of 94.25%, alongside precision and recall of 94.29% and 94.24% respectively. Furthermore, Local Interpretable Model-agnostic Explanation (LIME) saliency maps are employed to enhance interpretability by defining critical regions in the images that influenced model predictions. Overall, this work highlights the importance of AI in advancing medical imaging by combining high classification accuracy with interpretability.

Paper Structure

This paper contains 11 sections, 8 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Sample images from the Kvasir dataset, a widely used repository of GI endoscopic images designed for training and evaluating machine learning algorithms for automated diagnosis. Each labeled image corresponds to distinct GI findings, aiding in the development of diagnostic tools: (a) dyed-lifted polyps and (b) dyed resection margins—where staining highlights abnormalities for precise tissue removal; (c) esophagitis, characterized by inflamed esophageal tissue often associated with acid reflux; and (d), (e), and (f), showing normal anatomical landmarks such as the cecum, pylorus, and Z-line—essential for confirming complete visualization during endoscopy. Panels (g) and (h) highlight polyps with malignancy potential and ulcerative colitis, a chronic inflammatory condition requiring ongoing monitoring. The dataset's variety supports robust model development for AI-assisted diagnostics in gastroenterology.
  • Figure 2: Block diagram representing the proposed model architecture. The design includes a base model (EfficientNetB3), additional custom layers, and a softmax layer at the output for predicting class probabilities.
  • Figure 3: Training and Validation Metrics. (a) Loss reduction trend during training. (b) Accuracy progression during training.
  • Figure 4: Confusion matrices for the proposed model using the test set.
  • Figure 5: This illustrates saliency maps generated using LIME for predictions on endoscopy images. The figure pairs the original endoscopic images (a, c, e, g, i, k, m, o) with their corresponding LIME-based explanation maps (b, d, f, h, j, l, n, p). The yellow overlays identify regions that significantly influence the model’s predictions, offering insights into its decision-making process. These visualizations enhance interpretability by highlighting the model’s focus areas and its potential to aid in diagnosing gastrointestinal conditions.