Table of Contents
Fetching ...

Head and Neck Tumor Segmentation of MRI from Pre- and Mid-radiotherapy with Pre-training, Data Augmentation and Dual Flow UNet

Litingyu Wang, Wenjun Liao, Shichuan Zhang, Guotai Wang

TL;DR

This work tackles MRI-based segmentation of head and neck tumors and metastatic lymph nodes across pre-RT and mid-RT stages, addressing data scarcity and modality differences. The authors propose a multi-strategy framework combining external CT pre-training with histogram matching, MixUp data augmentation, and a Dual Flow UNet (DFUNet) that fuses pre-RT guidance into mid-RT segmentation via cross-attention. In five-fold cross-validation on the HNTS-MRG2024 MRI dataset, the method achieves aggregated DSCs of $80.65\%$ (Task-1) and $74.68\%$ (Task-2), with final test scores of $82.38\%$ and $72.53\%$, respectively, indicating robust gains for GTVn and variable gains for GTVp. The study demonstrates that cross-modal pre-training and multi-encoder fusion can improve MR-guided H&N segmentation, offering a pathway to better adaptive radiotherapy planning while highlighting challenges from class imbalance and model generalization across tumor subtypes.

Abstract

Head and neck tumors and metastatic lymph nodes are crucial for treatment planning and prognostic analysis. Accurate segmentation and quantitative analysis of these structures require pixel-level annotation, making automated segmentation techniques essential for the diagnosis and treatment of head and neck cancer. In this study, we investigated the effects of multiple strategies on the segmentation of pre-radiotherapy (pre-RT) and mid-radiotherapy (mid-RT) images. For the segmentation of pre-RT images, we utilized: 1) a fully supervised learning approach, and 2) the same approach enhanced with pre-trained weights and the MixUp data augmentation technique. For mid-RT images, we introduced a novel computational-friendly network architecture that features separate encoders for mid-RT images and registered pre-RT images with their labels. The mid-RT encoder branch integrates information from pre-RT images and labels progressively during the forward propagation. We selected the highest-performing model from each fold and used their predictions to create an ensemble average for inference. In the final test, our models achieved a segmentation performance of 82.38% for pre-RT and 72.53% for mid-RT on aggregated Dice Similarity Coefficient (DSC) as HiLab. Our code is available at https://github.com/WltyBY/HNTS-MRG2024_train_code.

Head and Neck Tumor Segmentation of MRI from Pre- and Mid-radiotherapy with Pre-training, Data Augmentation and Dual Flow UNet

TL;DR

This work tackles MRI-based segmentation of head and neck tumors and metastatic lymph nodes across pre-RT and mid-RT stages, addressing data scarcity and modality differences. The authors propose a multi-strategy framework combining external CT pre-training with histogram matching, MixUp data augmentation, and a Dual Flow UNet (DFUNet) that fuses pre-RT guidance into mid-RT segmentation via cross-attention. In five-fold cross-validation on the HNTS-MRG2024 MRI dataset, the method achieves aggregated DSCs of (Task-1) and (Task-2), with final test scores of and , respectively, indicating robust gains for GTVn and variable gains for GTVp. The study demonstrates that cross-modal pre-training and multi-encoder fusion can improve MR-guided H&N segmentation, offering a pathway to better adaptive radiotherapy planning while highlighting challenges from class imbalance and model generalization across tumor subtypes.

Abstract

Head and neck tumors and metastatic lymph nodes are crucial for treatment planning and prognostic analysis. Accurate segmentation and quantitative analysis of these structures require pixel-level annotation, making automated segmentation techniques essential for the diagnosis and treatment of head and neck cancer. In this study, we investigated the effects of multiple strategies on the segmentation of pre-radiotherapy (pre-RT) and mid-radiotherapy (mid-RT) images. For the segmentation of pre-RT images, we utilized: 1) a fully supervised learning approach, and 2) the same approach enhanced with pre-trained weights and the MixUp data augmentation technique. For mid-RT images, we introduced a novel computational-friendly network architecture that features separate encoders for mid-RT images and registered pre-RT images with their labels. The mid-RT encoder branch integrates information from pre-RT images and labels progressively during the forward propagation. We selected the highest-performing model from each fold and used their predictions to create an ensemble average for inference. In the final test, our models achieved a segmentation performance of 82.38% for pre-RT and 72.53% for mid-RT on aggregated Dice Similarity Coefficient (DSC) as HiLab. Our code is available at https://github.com/WltyBY/HNTS-MRG2024_train_code.

Paper Structure

This paper contains 19 sections, 3 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Two architectures used for training. (a) An encoder-decoder network, named basic segmentation network, with MixUp diagram. (b) Two separate encoders with one decoder named Dual Flow UNet (DFUNet).
  • Figure 2: CNN-based cross attention block to integrate secondary information $f_{pre}$ into primary information $f_{mid}$.
  • Figure 3: Raw image (the first column) and corresponding augmented images using Bézier Curve liu2024rpl.
  • Figure 4: Histogram matching on the SegRap2023 dataset. (a) shows the source image and its grayscale histogram from the SegRap2023 dataset. (b) displays the reference image from the HNTS-MRG2024 dataset, and (c) presents the matched image.
  • Figure 5: Examples of Task-1 illustrate the segmentation of GTVp (red) and GTVn (green) with mean DSC values (numbers in white). (a) represents a well-predicted case, while (b) shows a poorly predicted one. Classification of samples as well- or poor-predicted here refers to whether they meet the method's improvements.
  • ...and 1 more figures