Trans2Unet: Neural fusion for Nuclei Semantic Segmentation
Dinh-Phu Tran, Quoc-Anh Nguyen, Van-Truong Pham, Thi-Thao Tran
TL;DR
The paper tackles nuclei semantic segmentation with challenging overlapping nuclei. It introduces Trans2Unet, a two-branch architecture combining Unet and TransUnet, augmented by the WASP-KC module to improve efficiency, and fuses their outputs for final predictions. Empirical results on the 2018 Data Science Bowl (DSC = 0.9225, IoU = 0.8613) and GlaS (Dice = 89.94%, IoU = 82.54%) demonstrate competitive performance, validating the hybrid CNN-Transformer approach. The work highlights the benefit of integrating local and global features for accurate segmentation and suggests potential extensions to broader medical image segmentation tasks.
Abstract
Nuclei segmentation, despite its fundamental role in histopathological image analysis, is still a challenge work. The main challenge of this task is the existence of overlapping areas, which makes separating independent nuclei more complicated. In this paper, we propose a new two-branch architecture by combining the Unet and TransUnet networks for nuclei segmentation task. In the proposed architecture, namely Trans2Unet, the input image is first sent into the Unet branch whose the last convolution layer is removed. This branch makes the network combine features from different spatial regions of the input image and localizes more precisely the regions of interest. The input image is also fed into the second branch. In the second branch, which is called TransUnet branch, the input image will be divided into patches of images. With Vision transformer (ViT) in architecture, TransUnet can serve as a powerful encoder for medical image segmentation tasks and enhance image details by recovering localized spatial information. To boost up Trans2Unet efficiency and performance, we proposed to infuse TransUnet with a computational-efficient variation called "Waterfall" Atrous Spatial Pooling with Skip Connection (WASP-KC) module, which is inspired by the "Waterfall" Atrous Spatial Pooling (WASP) module. Experiment results on the 2018 Data Science Bowl benchmark show the effectiveness and performance of the proposed architecture while compared with previous segmentation models.
