Ensemble architecture in polyp segmentation

Hao-Yun Hsu; Yi-Ching Cheng; Guan-Hua Huang

Ensemble architecture in polyp segmentation

Hao-Yun Hsu, Yi-Ching Cheng, Guan-Hua Huang

TL;DR

An integrated framework that harnesses the advantages of different models to attain an optimal outcome is presented, fuse the learned features from convolutional and transformer models for prediction, thus engendering an ensemble technique to enhance model performance.

Abstract

This study explored the architecture of semantic segmentation and evaluated models that excel in polyp segmentation. We present an integrated framework that harnesses the advantages of different models to attain an optimal outcome. Specifically, in this framework, we fuse the learned features from convolutional and transformer models for prediction, thus engendering an ensemble technique to enhance model performance. Our experiments on polyp segmentation revealed that the proposed architecture surpassed other top models, exhibiting improved learning capacity and resilience. The code is available at https://github.com/HuangDLab/EnFormer.

Ensemble architecture in polyp segmentation

TL;DR

Abstract

Paper Structure (14 sections, 4 equations, 3 figures, 7 tables)

This paper contains 14 sections, 4 equations, 3 figures, 7 tables.

Introduction
Related work
Semantic segmentation
Polyp segmentation
Architecture in FCBFormer
Methods
Proposed method
Experiments
Dataset
Training configuration
Evaluation metrics
Result analysis
Visualization
Conclusion

Figures (3)

Figure 1: EnFormer. Green and blue areas represent the different encoder configurations for the convolution branch and the transformer branch, respectively. Gray indicates the FD, which combines features generated by both the convolution and transformer branches.
Figure 2: EnFormer-Lite. The architecture is similar to EnFormer; however, it lacks a decoding strategy for each encoder and is thus a lighter version of EnFormer.
Figure 3: Grad-CAM visualizations for the five datasets. The first three columns (image, mask, and fusion) highlight the regions in the original image identified by the segmentation mask. The columns $e_1^1$, $e_2^1,$ and $\mathcal{F}^1(e_1^1, e_2^1)$ correspond to the average outputs of the first block of the transformer encoder, convolution encoder, and FD, respectively. The columns $d_1$, $d_2,$ and $\mathcal{F}([e^j_i])$ represent the Grad-CAM visualizations of the last layer of the transformer decoder, convolution decoder, and fuse decoder, respectively. Column $\mathcal{S}$ represents the Grad-CAM visualization of the segmentation head.

Ensemble architecture in polyp segmentation

TL;DR

Abstract

Ensemble architecture in polyp segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (3)