Table of Contents
Fetching ...

EffiSegNet: Gastrointestinal Polyp Segmentation through a Pre-Trained EfficientNet-based Network with a Simplified Decoder

Ioannis A. Vezakis, Konstantinos Georgas, Dimitrios Fotiadis, George K. Matsopoulos

TL;DR

EffiSegNet addresses gastrointestinal polyp segmentation under data-limited conditions by leveraging a pre-trained EfficientNet backbone with a simplified decoder and full-scale feature fusion. The method achieves state-of-the-art results on the Kvasir-SEG dataset, notably with EffiSegNet-B4 attaining F1 = 0.9552, mDice = 0.9483, and mIoU = 0.9056, while also showing substantial gains from pre-training over training from scratch. The findings highlight the primacy of encoder design and transfer learning over decoder complexity, and demonstrate that a lightweight, scalable architecture can outperform more complex models such as DUCK-Net on key metrics. The work provides a flexible framework that can accommodate other backbones and offers practical benefits for improving colorectal cancer screening through robust polyp segmentation, with code and data splits released publicly.

Abstract

This work introduces EffiSegNet, a novel segmentation framework leveraging transfer learning with a pre-trained Convolutional Neural Network (CNN) classifier as its backbone. Deviating from traditional architectures with a symmetric U-shape, EffiSegNet simplifies the decoder and utilizes full-scale feature fusion to minimize computational cost and the number of parameters. We evaluated our model on the gastrointestinal polyp segmentation task using the publicly available Kvasir-SEG dataset, achieving state-of-the-art results. Specifically, the EffiSegNet-B4 network variant achieved an F1 score of 0.9552, mean Dice (mDice) 0.9483, mean Intersection over Union (mIoU) 0.9056, Precision 0.9679, and Recall 0.9429 with a pre-trained backbone - to the best of our knowledge, the highest reported scores in the literature for this dataset. Additional training from scratch also demonstrated exceptional performance compared to previous work, achieving an F1 score of 0.9286, mDice 0.9207, mIoU 0.8668, Precision 0.9311 and Recall 0.9262. These results underscore the importance of a well-designed encoder in image segmentation networks and the effectiveness of transfer learning approaches.

EffiSegNet: Gastrointestinal Polyp Segmentation through a Pre-Trained EfficientNet-based Network with a Simplified Decoder

TL;DR

EffiSegNet addresses gastrointestinal polyp segmentation under data-limited conditions by leveraging a pre-trained EfficientNet backbone with a simplified decoder and full-scale feature fusion. The method achieves state-of-the-art results on the Kvasir-SEG dataset, notably with EffiSegNet-B4 attaining F1 = 0.9552, mDice = 0.9483, and mIoU = 0.9056, while also showing substantial gains from pre-training over training from scratch. The findings highlight the primacy of encoder design and transfer learning over decoder complexity, and demonstrate that a lightweight, scalable architecture can outperform more complex models such as DUCK-Net on key metrics. The work provides a flexible framework that can accommodate other backbones and offers practical benefits for improving colorectal cancer screening through robust polyp segmentation, with code and data splits released publicly.

Abstract

This work introduces EffiSegNet, a novel segmentation framework leveraging transfer learning with a pre-trained Convolutional Neural Network (CNN) classifier as its backbone. Deviating from traditional architectures with a symmetric U-shape, EffiSegNet simplifies the decoder and utilizes full-scale feature fusion to minimize computational cost and the number of parameters. We evaluated our model on the gastrointestinal polyp segmentation task using the publicly available Kvasir-SEG dataset, achieving state-of-the-art results. Specifically, the EffiSegNet-B4 network variant achieved an F1 score of 0.9552, mean Dice (mDice) 0.9483, mean Intersection over Union (mIoU) 0.9056, Precision 0.9679, and Recall 0.9429 with a pre-trained backbone - to the best of our knowledge, the highest reported scores in the literature for this dataset. Additional training from scratch also demonstrated exceptional performance compared to previous work, achieving an F1 score of 0.9286, mDice 0.9207, mIoU 0.8668, Precision 0.9311 and Recall 0.9262. These results underscore the importance of a well-designed encoder in image segmentation networks and the effectiveness of transfer learning approaches.
Paper Structure (7 sections, 2 equations, 1 figure, 3 tables)

This paper contains 7 sections, 2 equations, 1 figure, 3 tables.

Figures (1)

  • Figure 1: The EffiSegNet architecture. A pre-trained EfficientNet model serves as the backbone of the network, scaling it up and down using compound scaling.