Table of Contents
Fetching ...

Ensemble architecture in polyp segmentation

Hao-Yun Hsu, Yi-Ching Cheng, Guan-Hua Huang

TL;DR

An integrated framework that harnesses the advantages of different models to attain an optimal outcome is presented, fuse the learned features from convolutional and transformer models for prediction, thus engendering an ensemble technique to enhance model performance.

Abstract

This study explored the architecture of semantic segmentation and evaluated models that excel in polyp segmentation. We present an integrated framework that harnesses the advantages of different models to attain an optimal outcome. Specifically, in this framework, we fuse the learned features from convolutional and transformer models for prediction, thus engendering an ensemble technique to enhance model performance. Our experiments on polyp segmentation revealed that the proposed architecture surpassed other top models, exhibiting improved learning capacity and resilience. The code is available at https://github.com/HuangDLab/EnFormer.

Ensemble architecture in polyp segmentation

TL;DR

An integrated framework that harnesses the advantages of different models to attain an optimal outcome is presented, fuse the learned features from convolutional and transformer models for prediction, thus engendering an ensemble technique to enhance model performance.

Abstract

This study explored the architecture of semantic segmentation and evaluated models that excel in polyp segmentation. We present an integrated framework that harnesses the advantages of different models to attain an optimal outcome. Specifically, in this framework, we fuse the learned features from convolutional and transformer models for prediction, thus engendering an ensemble technique to enhance model performance. Our experiments on polyp segmentation revealed that the proposed architecture surpassed other top models, exhibiting improved learning capacity and resilience. The code is available at https://github.com/HuangDLab/EnFormer.
Paper Structure (14 sections, 4 equations, 3 figures, 7 tables)

This paper contains 14 sections, 4 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: EnFormer. Green and blue areas represent the different encoder configurations for the convolution branch and the transformer branch, respectively. Gray indicates the FD, which combines features generated by both the convolution and transformer branches.
  • Figure 2: EnFormer-Lite. The architecture is similar to EnFormer; however, it lacks a decoding strategy for each encoder and is thus a lighter version of EnFormer.
  • Figure 3: Grad-CAM visualizations for the five datasets. The first three columns (image, mask, and fusion) highlight the regions in the original image identified by the segmentation mask. The columns $e_1^1$, $e_2^1,$ and $\mathcal{F}^1(e_1^1, e_2^1)$ correspond to the average outputs of the first block of the transformer encoder, convolution encoder, and FD, respectively. The columns $d_1$, $d_2,$ and $\mathcal{F}([e^j_i])$ represent the Grad-CAM visualizations of the last layer of the transformer decoder, convolution decoder, and fuse decoder, respectively. Column $\mathcal{S}$ represents the Grad-CAM visualization of the segmentation head.