Table of Contents
Fetching ...

Deep Semantic Segmentation of Natural and Medical Images: A Review

Saeid Asgari Taghanaki, Kumar Abhishek, Joseph Paul Cohen, Julien Cohen-Adad, Ghassan Hamarneh

TL;DR

Semantic image segmentation faces data scarcity and class imbalance challenges, especially in medical imaging. The paper surveys six families of deep learning approaches—architectural improvements, data synthesis-based methods, loss-function design, sequenced models, weakly supervised strategies, and multi-task frameworks—and discusses their applicability to natural and medical images, including representative methods such as FCNs, encoder–decoder networks, attention modules, and adversarial training, along with key loss variants like Dice, Focal, and Lovász-Softmax. It also covers GAN-based data augmentation, semi-/weak supervision, and multi-task learning, and outlines future directions such as neural architecture search, multi-modal data fusion, and physics-based data generation to advance robust medical segmentation. The review highlights the need for standardized benchmarks and scalable methods that generalize across diverse modalities and clinical settings, guiding researchers toward practical, high-impact advancements in semantic segmentation.

Abstract

The semantic image segmentation task consists of classifying each pixel of an image into an instance, where each instance corresponds to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the leading deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural, data synthesis-based, loss function-based, sequenced models, weakly supervised, and multi-task methods and provide a comprehensive review of the contributions in each of these groups. Further, for each group, we analyze each variant of these groups and discuss the limitations of the current approaches and present potential future research directions for semantic image segmentation.

Deep Semantic Segmentation of Natural and Medical Images: A Review

TL;DR

Semantic image segmentation faces data scarcity and class imbalance challenges, especially in medical imaging. The paper surveys six families of deep learning approaches—architectural improvements, data synthesis-based methods, loss-function design, sequenced models, weakly supervised strategies, and multi-task frameworks—and discusses their applicability to natural and medical images, including representative methods such as FCNs, encoder–decoder networks, attention modules, and adversarial training, along with key loss variants like Dice, Focal, and Lovász-Softmax. It also covers GAN-based data augmentation, semi-/weak supervision, and multi-task learning, and outlines future directions such as neural architecture search, multi-modal data fusion, and physics-based data generation to advance robust medical segmentation. The review highlights the need for standardized benchmarks and scalable methods that generalize across diverse modalities and clinical settings, guiding researchers toward practical, high-impact advancements in semantic segmentation.

Abstract

The semantic image segmentation task consists of classifying each pixel of an image into an instance, where each instance corresponds to a class. This task is a part of the concept of scene understanding or better explaining the global context of an image. In the medical image analysis domain, image segmentation can be used for image-guided interventions, radiotherapy, or improved radiological diagnostics. In this review, we categorize the leading deep learning-based medical and non-medical image segmentation solutions into six main groups of deep architectural, data synthesis-based, loss function-based, sequenced models, weakly supervised, and multi-task methods and provide a comprehensive review of the contributions in each of these groups. Further, for each group, we analyze each variant of these groups and discuss the limitations of the current approaches and present potential future research directions for semantic image segmentation.

Paper Structure

This paper contains 43 sections, 38 equations, 16 figures, 3 tables.

Figures (16)

  • Figure 1: An overview of the deep learning based segmentation methods covered in this review.
  • Figure 2: A typical deep neural network based semantic segmentation pipeline. Each component in the pipeline indicates the section of this paper that covers the corresponding contributions.
  • Figure 3: Fully convolutional networks can efficiently learn to make dense predictions for per-pixel tasks like semantic segmentation long2015fully.
  • Figure 4: Upsampling and fusion step of the fully convolution networks long2015fully.
  • Figure 5: Top: An illustration of the SegNet architecture. There are no fully connected layers, and hence it is only convolutional. Bottom: An illustration of SegNet and FCN long2015fully decoders. $a, b, c, d$ correspond to values in a feature map. SegNet uses the max-pooling indices to upsample (without learning) the feature map(s) and convolves with a trainable decoder filter bank. FCN upsamples by learning to deconvolve the input feature map and adds the corresponding encoder feature map to produce the decoder output. This feature map is the output of the max-pooling layer (includes sub-sampling) in the corresponding encoder. Note that there are no trainable decoder filters in FCN (badrinarayanan2015segnet).
  • ...and 11 more figures