A hybrid approach for improving U-Net variants in medical image segmentation
Aitik Gupta, Joydip Dhar
TL;DR
The paper addresses the need for efficient medical image segmentation by reducing trainable parameters in prominent U‑Net variants without sacrificing accuracy. It introduces depthwise separable convolutions and attention in skip connections, coupled with residuals, to create a hybrid architecture that lowers parameter counts while maintaining or improving performance on skin lesion and thyroid gland segmentation. Ablation studies and comparisons against U‑Net, Attention U‑Net, and MultiResUNet demonstrate improved IoU and Dice with substantially fewer parameters, validating the approach. The work offers a practical path toward faster inference and deployment of segmentation models in clinical settings, especially where computational resources are limited.
Abstract
Medical image segmentation is vital to the area of medical imaging because it enables professionals to more accurately examine and understand the information offered by different imaging modalities. The technique of splitting a medical image into various segments or regions of interest is known as medical image segmentation. The segmented images that are produced can be used for many different things, including diagnosis, surgery planning, and therapy evaluation. In initial phase of research, major focus has been given to review existing deep-learning approaches, including researches like MultiResUNet, Attention U-Net, classical U-Net, and other variants. The attention feature vectors or maps dynamically add important weights to critical information, and most of these variants use these to increase accuracy, but the network parameter requirements are somewhat more stringent. They face certain problems such as overfitting, as their number of trainable parameters is very high, and so is their inference time. Therefore, the aim of this research is to reduce the network parameter requirements using depthwise separable convolutions, while maintaining performance over some medical image segmentation tasks such as skin lesion segmentation using attention system and residual connections.
