Table of Contents
Fetching ...

MugenNet: A Novel Combined Convolution Neural Network and Transformer Network with its Application for Colonic Polyp Image Segmentation

Chen Peng, Zhiqin Qian, Kunyu Wang, Qi Luo, Zhuming Bi, Wenjun Zhang

TL;DR

This work addresses the challenge of accurate and fast colonic polyp segmentation by introducing MugenNet, a hybrid network that fuses a CNN (ResNet-34) and a Transformer (ViT) in parallel. A Mugen module with attention mechanisms aggregates features from both branches, guided by a residual architecture and attention gates, and trained with a composite IoU and BCE loss. Across five public polyp datasets, MugenNet achieves state-of-the-art or near-state-of-the-art accuracy, demonstrates robust generalization to unseen data, and delivers real-time inference (~56 fps), underscoring the practical potential for real-time colonoscopy assistance. Ablation studies confirm the complementary value of both branches and the fusion module. The results suggest a generalizable hybrid learning approach for combining complementary models in medical image segmentation and beyond.

Abstract

Biomedical image segmentation is a very important part in disease diagnosis. The term "colonic polyps" refers to polypoid lesions that occur on the surface of the colonic mucosa within the intestinal lumen. In clinical practice, early detection of polyps is conducted through colonoscopy examinations and biomedical image processing. Therefore, the accurate polyp image segmentation is of great significance in colonoscopy examinations. Convolutional Neural Network (CNN) is a common automatic segmentation method, but its main disadvantage is the long training time. Transformer utilizes a self-attention mechanism, which essentially assigns different importance weights to each piece of information, thus achieving high computational efficiency during segmentation. However, a potential drawback is the risk of information loss. In the study reported in this paper, based on the well-known hybridization principle, we proposed a method to combine CNN and Transformer to retain the strengths of both, and we applied this method to build a system called MugenNet for colonic polyp image segmentation. We conducted a comprehensive experiment to compare MugenNet with other CNN models on five publicly available datasets. The ablation experiment on MugentNet was conducted as well. The experimental results show that MugenNet achieves significantly higher processing speed and accuracy compared with CNN alone. The generalized implication with our work is a method to optimally combine two complimentary methods of machine learning.

MugenNet: A Novel Combined Convolution Neural Network and Transformer Network with its Application for Colonic Polyp Image Segmentation

TL;DR

This work addresses the challenge of accurate and fast colonic polyp segmentation by introducing MugenNet, a hybrid network that fuses a CNN (ResNet-34) and a Transformer (ViT) in parallel. A Mugen module with attention mechanisms aggregates features from both branches, guided by a residual architecture and attention gates, and trained with a composite IoU and BCE loss. Across five public polyp datasets, MugenNet achieves state-of-the-art or near-state-of-the-art accuracy, demonstrates robust generalization to unseen data, and delivers real-time inference (~56 fps), underscoring the practical potential for real-time colonoscopy assistance. Ablation studies confirm the complementary value of both branches and the fusion module. The results suggest a generalizable hybrid learning approach for combining complementary models in medical image segmentation and beyond.

Abstract

Biomedical image segmentation is a very important part in disease diagnosis. The term "colonic polyps" refers to polypoid lesions that occur on the surface of the colonic mucosa within the intestinal lumen. In clinical practice, early detection of polyps is conducted through colonoscopy examinations and biomedical image processing. Therefore, the accurate polyp image segmentation is of great significance in colonoscopy examinations. Convolutional Neural Network (CNN) is a common automatic segmentation method, but its main disadvantage is the long training time. Transformer utilizes a self-attention mechanism, which essentially assigns different importance weights to each piece of information, thus achieving high computational efficiency during segmentation. However, a potential drawback is the risk of information loss. In the study reported in this paper, based on the well-known hybridization principle, we proposed a method to combine CNN and Transformer to retain the strengths of both, and we applied this method to build a system called MugenNet for colonic polyp image segmentation. We conducted a comprehensive experiment to compare MugenNet with other CNN models on five publicly available datasets. The ablation experiment on MugentNet was conducted as well. The experimental results show that MugenNet achieves significantly higher processing speed and accuracy compared with CNN alone. The generalized implication with our work is a method to optimally combine two complimentary methods of machine learning.
Paper Structure (14 sections, 11 equations, 7 figures, 7 tables)

This paper contains 14 sections, 11 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: The architecture of MugenNet: combine Transformer branch (left) with CNN branch (right) in Mugen Module (middle). FC: Fully Connected. AG: Attention Gate. SE: Squeeze and Excitation.
  • Figure 2: The architecture of attention gate.
  • Figure 3: Comparison of the MugenNet with the other four nets (U-Net, U-net++, SFA, Pranet) on the Kvasir dataset.
  • Figure 4: Comparison of the performance of MugenNet on the five datasets (ClinicDB, ColonDB, CVC 300, ETIS, Kvasir).
  • Figure 5: Training process on four datasets.
  • ...and 2 more figures