Multi-scale Cascaded Foundation Model for Whole-body Organs-at-risk Segmentation
Rui Hao, Dayu Tan, Qiankun Li, Chunhou Zheng, Weimin Zhong, Zhigang Zeng
TL;DR
MCFNet addresses the challenge of robust whole-body OAR segmentation across diverse imaging protocols by fusing multi-scale features through two complementary backbones: the Sharp Extraction Backbone (SEB) for boundary fidelity and the Flexible Connection Backbone (FCB) for global context. A Linear Attention Transformer (LAT) is embedded in the skip connections to model long-range dependencies efficiently, and an Adaptive-MFA strategy dynamically aggregates multi-scale predictions during training. The architecture includes a Cascaded Skip-connection Module and an Aggregation module that yields a final prediction $Pred = u × p_1 + v × p_2 + w × p_3 + x × p_4$, enhanced by adaptive loss weighting across four decoder scales. Extensive experiments on ten heterogeneous datasets demonstrate strong cross-dataset generalization and state-of-the-art performance, with evidence of improved boundary accuracy and robustness in clinical scenarios, and the authors provide publicly available code for reproducibility.
Abstract
Accurate segmentation of organs-at-risk (OARs) is vital for safe and precise radiotherapy and surgery. Most existing studies segment only a limited set of organs or regions, lacking a systematic treatment of OARs segmentation. We present a Multi-scale Cascaded Fusion Network (MCFNet) that aggregates features across multiple scales and resolutions. MCFNet consists of a Sharp Extraction Backbone for the downsampling path and a Flexible Connection Backbone for skip-connection fusion, strengthening representation learning in both stages. This design improves boundary localization and preserves fine structures while maintaining computational efficiency, enabling reliable performance even on low-resolution inputs. Experiments on an NVIDIA A6000 GPU using 36,131 image-mask pairs from 671 patients across 10 datasets show consistent robustness and strong cross-dataset generalization. An adaptive loss-aggregation strategy further stabilizes optimization and yields additional gains in accuracy and training efficiency. Through extensive validation, MCFNet outperforms existing methods, excelling in organ segmentation and providing reliable image-guided support for computer-aided diagnosis. Our solution aims to improve the precision and safety of radiotherapy and surgery while supporting personalized treatment, advancing modern medical technology. The code has been made available on GitHub: https://github.com/Henry991115/MCFNet.
