MicroSegNet: A Deep Learning Approach for Prostate Segmentation on Micro-Ultrasound Images
Hongxu Jiang, Muhammad Imran, Preethika Muralidharan, Anjali Patel, Jake Pensa, Muxuan Liang, Tarik Benidir, Joseph R. Grajo, Jason P. Joseph, Russell Terry, John Michael DiBianco, Li-Ming Su, Yuyin Zhou, Wayne G. Brisbane, Wei Shao
TL;DR
MicroSegNet tackles the challenging problem of prostate capsule segmentation on high-resolution micro-ultrasound images by integrating a transformer-based TransUNet backbone with multi-scale deep supervision and a novel annotation-guided BCE loss that prioritizes hard-to-segment regions defined by expert–non-expert annotation disagreements. The approach yields state-of-the-art Dice and boundary accuracy (DSC ≈ 0.939, HD95 ≈ 2.02 mm) on a dedicated micro-US dataset and even surpasses expert and novice human annotators in segmentation quality. Key contributions include the annotation-guided loss, multi-scale supervision, and the first publicly available micro-US dataset with expert and non-expert annotations, enabling broader benchmarking and development. The work supports real-time, accurate capsule delineation that can improve biopsy targeting, image registration, and treatment planning in prostate cancer care, while acknowledging limitations of single-center data and device-specific settings. Overall, MicroSegNet demonstrates the potential for precise, scalable automated segmentation in micro-US guided procedures and provides resources to accelerate future research.
Abstract
Micro-ultrasound (micro-US) is a novel 29-MHz ultrasound technique that provides 3-4 times higher resolution than traditional ultrasound, potentially enabling low-cost, accurate diagnosis of prostate cancer. Accurate prostate segmentation is crucial for prostate volume measurement, cancer diagnosis, prostate biopsy, and treatment planning. However, prostate segmentation on micro-US is challenging due to artifacts and indistinct borders between the prostate, bladder, and urethra in the midline. This paper presents MicroSegNet, a multi-scale annotation-guided transformer UNet model designed specifically to tackle these challenges. During the training process, MicroSegNet focuses more on regions that are hard to segment (hard regions), characterized by discrepancies between expert and non-expert annotations. We achieve this by proposing an annotation-guided binary cross entropy (AG-BCE) loss that assigns a larger weight to prediction errors in hard regions and a lower weight to prediction errors in easy regions. The AG-BCE loss was seamlessly integrated into the training process through the utilization of multi-scale deep supervision, enabling MicroSegNet to capture global contextual dependencies and local information at various scales. We trained our model using micro-US images from 55 patients, followed by evaluation on 20 patients. Our MicroSegNet model achieved a Dice coefficient of 0.939 and a Hausdorff distance of 2.02 mm, outperforming several state-of-the-art segmentation methods, as well as three human annotators with different experience levels. Our code is publicly available at https://github.com/mirthAI/MicroSegNet and our dataset is publicly available at https://zenodo.org/records/10475293.
