Table of Contents
Fetching ...

Demystifying the Effect of Receptive Field Size in U-Net Models for Medical Image Segmentation

Vincent Loos, Rohit Pardasani, Navchetan Awasthi

TL;DR

It is demonstrated that there exists an optimal TRF size that successfully strikes a balance between capturing a wider global context and maintaining computational efficiency, thereby optimizing model performance and paving the way for future exploration of other segmentation architectures.

Abstract

Medical image segmentation is a critical task in healthcare applications, and U-Nets have demonstrated promising results. This work delves into the understudied aspect of receptive field (RF) size and its impact on the U-Net and Attention U-Net architectures. This work explores several critical elements including the relationship between RF size, characteristics of the region of interest, and model performance, as well as the balance between RF size and computational costs for U-Net and Attention U-Net methods for different datasets. This work also proposes a mathematical notation for representing the theoretical receptive field (TRF) of a given layer in a network and proposes two new metrics - effective receptive field (ERF) rate and the Object rate to quantify the fraction of significantly contributing pixels within the ERF against the TRF area and assessing the relative size of the segmentation object compared to the TRF size respectively. The results demonstrate that there exists an optimal TRF size that successfully strikes a balance between capturing a wider global context and maintaining computational efficiency, thereby optimizing model performance. Interestingly, a distinct correlation is observed between the data complexity and the required TRF size; segmentation based solely on contrast achieved peak performance even with smaller TRF sizes, whereas more complex segmentation tasks necessitated larger TRFs. Attention U-Net models consistently outperformed their U-Net counterparts, highlighting the value of attention mechanisms regardless of TRF size. These novel insights present an invaluable resource for developing more efficient U-Net-based architectures for medical imaging and pave the way for future exploration. A tool is also developed that calculates the TRF for a U-Net (and Attention U-Net) model, and also suggest an appropriate TRF size for a given model and dataset.

Demystifying the Effect of Receptive Field Size in U-Net Models for Medical Image Segmentation

TL;DR

It is demonstrated that there exists an optimal TRF size that successfully strikes a balance between capturing a wider global context and maintaining computational efficiency, thereby optimizing model performance and paving the way for future exploration of other segmentation architectures.

Abstract

Medical image segmentation is a critical task in healthcare applications, and U-Nets have demonstrated promising results. This work delves into the understudied aspect of receptive field (RF) size and its impact on the U-Net and Attention U-Net architectures. This work explores several critical elements including the relationship between RF size, characteristics of the region of interest, and model performance, as well as the balance between RF size and computational costs for U-Net and Attention U-Net methods for different datasets. This work also proposes a mathematical notation for representing the theoretical receptive field (TRF) of a given layer in a network and proposes two new metrics - effective receptive field (ERF) rate and the Object rate to quantify the fraction of significantly contributing pixels within the ERF against the TRF area and assessing the relative size of the segmentation object compared to the TRF size respectively. The results demonstrate that there exists an optimal TRF size that successfully strikes a balance between capturing a wider global context and maintaining computational efficiency, thereby optimizing model performance. Interestingly, a distinct correlation is observed between the data complexity and the required TRF size; segmentation based solely on contrast achieved peak performance even with smaller TRF sizes, whereas more complex segmentation tasks necessitated larger TRFs. Attention U-Net models consistently outperformed their U-Net counterparts, highlighting the value of attention mechanisms regardless of TRF size. These novel insights present an invaluable resource for developing more efficient U-Net-based architectures for medical imaging and pave the way for future exploration. A tool is also developed that calculates the TRF for a U-Net (and Attention U-Net) model, and also suggest an appropriate TRF size for a given model and dataset.
Paper Structure (31 sections, 13 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 13 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Variable attention U-Net in which the depth ($n$), kernel size of the convolution layers ($k$), and number of channels ($c$) can be tuned to alter the size of the TRF. It can be converted to a regular U-Net by simply removing the attention gates and gating signals.
  • Figure 2: Example of Theoretical Receptive Field (TRF) and Effective Receptive Field (ERF) in an image. The yellow square denotes the TRF, the maximum input area influencing the output pixel located at the centre of the square. The gray pixels, representing the ERF, show the actual input area affecting a neuron's activation, with intensity indicating the impact level.
  • Figure 3: Typical images and segmentation masks for the synthetic datasets (A and B) and medical datasets (Fetal head, Fetal head 2, Kidneys, Lungs, Thyroid, Nerve).
  • Figure 4: Examples of determining the threshold ($\varepsilon$) for the ERF rate with KDE for bimodally distributed ERF pixel values (top row) and positively skewed distributed ERF pixel values (bottom row).
  • Figure 5: Performance of the shapes datasets (A and B) for the regular U-Net
  • ...and 4 more figures