Medical Image Segmentation Review: The success of U-Net

Reza Azad; Ehsan Khodapanah Aghdam; Amelie Rauland; Yiwei Jia; Atlas Haddadi Avval; Afshin Bozorgpour; Sanaz Karimijafarbigloo; Joseph Paul Cohen; Ehsan Adeli; Dorit Merhof

Medical Image Segmentation Review: The success of U-Net

Reza Azad, Ehsan Khodapanah Aghdam, Amelie Rauland, Yiwei Jia, Atlas Haddadi Avval, Afshin Bozorgpour, Sanaz Karimijafarbigloo, Joseph Paul Cohen, Ehsan Adeli, Dorit Merhof

TL;DR

The paper surveys U-Net and its numerous variants for medical image segmentation, proposing a taxonomy that organizes extensions by where the design changes occur (skip connections, backbones, bottlenecks, transformers, rich representations, and probabilistic design). It compares methods on public datasets, highlighting transformer-based designs (e.g., UCTransNet, MISSFormer) that improve global context at the cost of computation, and offers open-source resources to foster reproducibility. The study emphasizes practical deployment considerations, including memory efficiency, interpretability, federated learning, and software ecosystems, to guide clinicians and researchers toward scalable, robust segmentation tools. Overall, it provides a comprehensive framework to select and extend U-Net architectures for diverse medical imaging tasks and data regimes.

Abstract

Automatic medical image segmentation is a crucial topic in the medical domain and successively a critical counterpart in the computer-aided diagnosis paradigm. U-Net is the most widespread image segmentation architecture due to its flexibility, optimized modular design, and success in all medical image modalities. Over the years, the U-Net model achieved tremendous attention from academic and industrial researchers. Several extensions of this network have been proposed to address the scale and complexity created by medical tasks. Addressing the deficiency of the naive U-Net model is the foremost step for vendors to utilize the proper U-Net variant model for their business. Having a compendium of different variants in one place makes it easier for builders to identify the relevant research. Also, for ML researchers it will help them understand the challenges of the biological tasks that challenge the model. To address this, we discuss the practical aspects of the U-Net model and suggest a taxonomy to categorize each network variant. Moreover, to measure the performance of these strategies in a clinical application, we propose fair evaluations of some unique and famous designs on well-known datasets. We provide a comprehensive implementation library with trained models for future research. In addition, for ease of future studies, we created an online list of U-Net papers with their possible official implementation. All information is gathered in https://github.com/NITR098/Awesome-U-Net repository.

Medical Image Segmentation Review: The success of U-Net

TL;DR

Abstract

Paper Structure (54 sections, 28 equations, 37 figures, 3 tables)

This paper contains 54 sections, 28 equations, 37 figures, 3 tables.

Introduction
Taxonomy
2D U-Net
3D U-Net
Clinical Importance and Effect of U-Net
U-Net Extensions
Skip Connection Enhancements
Increasing the Number of Skip Connections
Processing Feature Maps within the Skip Connections
Combination of Encoder and Decoder Feature Maps
Backbone Design Enhancements
Residual Backbone
Multi-Resolution blocks
Re-considering Convolution
Recurrent Architecture
...and 39 more sections

Figures (37)

Figure 1: The number of research works published in the past decade using the U-Net model as their baseline to address various medical image analysis challenges. The visualization shows sumptuous attention from the research/industry community for this architecture, particularly the segmentation task which is the main objective of this review paper.
Figure 2: The proposed U-Net taxonomy categorizes different extensions of the U-Net model based on their underlying design idea. More specifically, our taxonomy takes into account the modular design of the U-Net model and shows where the improvement happens (e.g., skip connection). Due to the clarification and unity in the studies' denomination, we may utilize some brevities. In this case, each prefix number denotes 1. chen2021transunet, 2. wang2021transbts, 3. li2021gt, 4. xie2021cotr, 5. hatamizadeh2022swin, 6. wang2022mixed, 7. reza2022contextual, 8. wang2022uctransnet, 9. huang2022scaleformer, 10. valanarasu2021medical, 11. cao2021swin, 12. huang2021missformer, 13. wu2022d, 14. brudfors2021mrf, 15. klug2020bayesian, 16. myronenko20183d, 17. kohl2018probabilistic, 18. abraham2019novel, 19. fu2018joint, 20. moradi2019mfp, 21. dolz2018dense, 22. lachinov2018glioma, 23. islam2019brain, 24. drozdzal2016importance, 25. milletari2016v, 26. li2018h, 27. ibtehaz2020multiresunet, 28. karaali2022dr, 29. jha2020doubleu, 30. jin2019dunet, 31. chen2018s3d, 32. kou2019microaneurysms, 33. nasrin2019medical, 34. zhou2019unet++, 35. huang2020unet, 36. xiang2020bio, 37. oktay2018attention, 38. jin2020ra, 39. li2020attention, 40. lachinov2021projective, 41. azad2019bi, 42. li2019cr, 43. fan2020ma, 44. guo2021sa, 45. zeng2019ric, 46. azad2021deep, 47. azad2022smu, 48. hai2019fully, 49. wang2020noise, 50. wu2021jcs.
Figure 3: The initial 2D U-Net architecture that is designed to cope with semantic segmentation challenge. Figure from ronneberger2015u-arx.
Figure 4: Sample of the 3D medical dataset and a single selected 2D frame, where the target area (e.g., organ) is highlighted using the annotation mask. c.1) Cervical spine kaggle-cervical-spine-fracture-detection, c.2) Lung mader_2017, c.3) Fourteen abdominal organs landman2015miccai, c.4) Brain simpson2019largemenze2014multimodal, c.5) Heart simpson2019large, c.6) Hepatic vessel simpson2019large, c.7) Liver simpson2019large, c.8) Pancreas simpson2019large, c.9) Prostate simpson2019large.
Figure 5: A detailed overview of the U-Nets core involvement in medical image analysis and clinical use. An illustration of how U-Nets are involved in clinical decisions is discussed in research papers. The first block deals with image acquisition, preparation, and pre-processing steps to provide the data in a common format for the deep neural network. The second step uses a neural architecture search algorithm to find an efficient architecture for the task at hand while the third block is designed to perform post operations to further refine the network output. Finally, the application block uses the software output to assist specialists with a certain action (e.g., tumor growth monitoring).
...and 32 more figures

Medical Image Segmentation Review: The success of U-Net

TL;DR

Abstract

Medical Image Segmentation Review: The success of U-Net

Authors

TL;DR

Abstract

Table of Contents

Figures (37)