Table of Contents
Fetching ...

Segmenting Fetal Head with Efficient Fine-tuning Strategies in Low-resource Settings: an empirical study with U-Net

Fangyijie Wang, Guénolé Silvestre, Kathleen M. Curran

TL;DR

This study addresses the challenge of accurately segmenting the fetal head in ultrasound images for head circumference estimation in low-resource settings. It evaluates a U‑Net model with a lightweight MobileNet v2 encoder and a range of fine-tuning strategies, showing that decoder-focused fine-tuning (FT_Decoder) yields superior segmentation with far fewer trainable parameters. Across diverse datasets from high- and low-resource contexts, FT_Decoder demonstrates strong generalization and transferability, outperforming training from scratch and encoder-focused approaches. The work provides practical guidance for efficient fetal head segmentation and releases code and fine-tuned weights to facilitate adoption in resource-constrained environments.

Abstract

Accurate measurement of fetal head circumference is crucial for estimating fetal growth during routine prenatal screening. Prior to measurement, it is necessary to accurately identify and segment the region of interest, specifically the fetal head, in ultrasound images. Recent advancements in deep learning techniques have shown significant progress in segmenting the fetal head using encoder-decoder models. Among these models, U-Net has become a standard approach for accurate segmentation. However, training an encoder-decoder model can be a time-consuming process that demands substantial computational resources. Moreover, fine-tuning these models is particularly challenging when there is a limited amount of data available. There are still no "best-practice" guidelines for optimal fine-tuning of U-net for fetal ultrasound image segmentation. This work summarizes existing fine-tuning strategies with various backbone architectures, model components, and fine-tuning strategies across ultrasound data from Netherlands, Spain, Malawi, Egypt and Algeria. Our study shows that (1) fine-tuning U-Net leads to better performance than training from scratch, (2) fine-tuning strategies in decoder are superior to other strategies, (3) network architecture with less number of parameters can achieve similar or better performance. We also demonstrate the effectiveness of fine-tuning strategies in low-resource settings and further expand our experiments into few-shot learning. Lastly, we publicly released our code and specific fine-tuned weights.

Segmenting Fetal Head with Efficient Fine-tuning Strategies in Low-resource Settings: an empirical study with U-Net

TL;DR

This study addresses the challenge of accurately segmenting the fetal head in ultrasound images for head circumference estimation in low-resource settings. It evaluates a U‑Net model with a lightweight MobileNet v2 encoder and a range of fine-tuning strategies, showing that decoder-focused fine-tuning (FT_Decoder) yields superior segmentation with far fewer trainable parameters. Across diverse datasets from high- and low-resource contexts, FT_Decoder demonstrates strong generalization and transferability, outperforming training from scratch and encoder-focused approaches. The work provides practical guidance for efficient fetal head segmentation and releases code and fine-tuned weights to facilitate adoption in resource-constrained environments.

Abstract

Accurate measurement of fetal head circumference is crucial for estimating fetal growth during routine prenatal screening. Prior to measurement, it is necessary to accurately identify and segment the region of interest, specifically the fetal head, in ultrasound images. Recent advancements in deep learning techniques have shown significant progress in segmenting the fetal head using encoder-decoder models. Among these models, U-Net has become a standard approach for accurate segmentation. However, training an encoder-decoder model can be a time-consuming process that demands substantial computational resources. Moreover, fine-tuning these models is particularly challenging when there is a limited amount of data available. There are still no "best-practice" guidelines for optimal fine-tuning of U-net for fetal ultrasound image segmentation. This work summarizes existing fine-tuning strategies with various backbone architectures, model components, and fine-tuning strategies across ultrasound data from Netherlands, Spain, Malawi, Egypt and Algeria. Our study shows that (1) fine-tuning U-Net leads to better performance than training from scratch, (2) fine-tuning strategies in decoder are superior to other strategies, (3) network architecture with less number of parameters can achieve similar or better performance. We also demonstrate the effectiveness of fine-tuning strategies in low-resource settings and further expand our experiments into few-shot learning. Lastly, we publicly released our code and specific fine-tuned weights.
Paper Structure (13 sections, 5 figures, 2 tables)

This paper contains 13 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of the U-Net network with encoder ($E$) and decoder ($D$). The encoder stack is replaced with MobileNet v2. The features from layers within $E$ are concatenated to features in $D$ by skip connections. The left figure illustrates the details of the bottleneck. $N$: batch size.
  • Figure 2: The nine fine-tuning strategies. ($a$) freezes the entire encoder with random weights and trains the decoder; ($b$) trains the encoder with random weights, but freezes the decoder; ($c$) FT_Encoder strategy trains the entire encoder with pre-trained ImageNet weights; ($d$) trains decoder layer 1; ($e$) trains decoder layers 1,2; ($f$) trains decoder layers 1,2,3; ($g$) trains decoder layers 3,4,5; ($h$) trains decoder layer 5; and ($i$) FT_Decoder strategy trains decoder layers 1,2,3,4,5. $E$: Encoder; $D$: Decoder; FT: Fine-tuning.
  • Figure 3: Examples of the maternal-fetal head US images from our multi-centre data set.
  • Figure 4: Selected examples of ground truth masks and predicted masks by U-Net baseline and fine-tuning strategies FT_Encoder, FT_Decoder respectively. Numbers stand for DSC of each image.
  • Figure 5: The results of the U-Net baseline, fine-tuning strategies FT_Encoder and FT_Decoder are presented. The Left plot shows the $\operatorname{DSC}$ versus Trainable Layers by Train Size. The Right plot showcases the mean and variance of $\operatorname{DSC}$.