Head and Neck Tumor Segmentation of MRI from Pre- and Mid-radiotherapy with Pre-training, Data Augmentation and Dual Flow UNet

Litingyu Wang; Wenjun Liao; Shichuan Zhang; Guotai Wang

Head and Neck Tumor Segmentation of MRI from Pre- and Mid-radiotherapy with Pre-training, Data Augmentation and Dual Flow UNet

Litingyu Wang, Wenjun Liao, Shichuan Zhang, Guotai Wang

TL;DR

This work tackles MRI-based segmentation of head and neck tumors and metastatic lymph nodes across pre-RT and mid-RT stages, addressing data scarcity and modality differences. The authors propose a multi-strategy framework combining external CT pre-training with histogram matching, MixUp data augmentation, and a Dual Flow UNet (DFUNet) that fuses pre-RT guidance into mid-RT segmentation via cross-attention. In five-fold cross-validation on the HNTS-MRG2024 MRI dataset, the method achieves aggregated DSCs of $80.65\%$ (Task-1) and $74.68\%$ (Task-2), with final test scores of $82.38\%$ and $72.53\%$, respectively, indicating robust gains for GTVn and variable gains for GTVp. The study demonstrates that cross-modal pre-training and multi-encoder fusion can improve MR-guided H&N segmentation, offering a pathway to better adaptive radiotherapy planning while highlighting challenges from class imbalance and model generalization across tumor subtypes.

Abstract

Head and neck tumors and metastatic lymph nodes are crucial for treatment planning and prognostic analysis. Accurate segmentation and quantitative analysis of these structures require pixel-level annotation, making automated segmentation techniques essential for the diagnosis and treatment of head and neck cancer. In this study, we investigated the effects of multiple strategies on the segmentation of pre-radiotherapy (pre-RT) and mid-radiotherapy (mid-RT) images. For the segmentation of pre-RT images, we utilized: 1) a fully supervised learning approach, and 2) the same approach enhanced with pre-trained weights and the MixUp data augmentation technique. For mid-RT images, we introduced a novel computational-friendly network architecture that features separate encoders for mid-RT images and registered pre-RT images with their labels. The mid-RT encoder branch integrates information from pre-RT images and labels progressively during the forward propagation. We selected the highest-performing model from each fold and used their predictions to create an ensemble average for inference. In the final test, our models achieved a segmentation performance of 82.38% for pre-RT and 72.53% for mid-RT on aggregated Dice Similarity Coefficient (DSC) as HiLab. Our code is available at https://github.com/WltyBY/HNTS-MRG2024_train_code.

Head and Neck Tumor Segmentation of MRI from Pre- and Mid-radiotherapy with Pre-training, Data Augmentation and Dual Flow UNet

TL;DR

Abstract

Head and Neck Tumor Segmentation of MRI from Pre- and Mid-radiotherapy with Pre-training, Data Augmentation and Dual Flow UNet

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)