Table of Contents
Fetching ...

Multi-scale Cascaded Foundation Model for Whole-body Organs-at-risk Segmentation

Rui Hao, Dayu Tan, Qiankun Li, Chunhou Zheng, Weimin Zhong, Zhigang Zeng

TL;DR

MCFNet addresses the challenge of robust whole-body OAR segmentation across diverse imaging protocols by fusing multi-scale features through two complementary backbones: the Sharp Extraction Backbone (SEB) for boundary fidelity and the Flexible Connection Backbone (FCB) for global context. A Linear Attention Transformer (LAT) is embedded in the skip connections to model long-range dependencies efficiently, and an Adaptive-MFA strategy dynamically aggregates multi-scale predictions during training. The architecture includes a Cascaded Skip-connection Module and an Aggregation module that yields a final prediction $Pred = u × p_1 + v × p_2 + w × p_3 + x × p_4$, enhanced by adaptive loss weighting across four decoder scales. Extensive experiments on ten heterogeneous datasets demonstrate strong cross-dataset generalization and state-of-the-art performance, with evidence of improved boundary accuracy and robustness in clinical scenarios, and the authors provide publicly available code for reproducibility.

Abstract

Accurate segmentation of organs-at-risk (OARs) is vital for safe and precise radiotherapy and surgery. Most existing studies segment only a limited set of organs or regions, lacking a systematic treatment of OARs segmentation. We present a Multi-scale Cascaded Fusion Network (MCFNet) that aggregates features across multiple scales and resolutions. MCFNet consists of a Sharp Extraction Backbone for the downsampling path and a Flexible Connection Backbone for skip-connection fusion, strengthening representation learning in both stages. This design improves boundary localization and preserves fine structures while maintaining computational efficiency, enabling reliable performance even on low-resolution inputs. Experiments on an NVIDIA A6000 GPU using 36,131 image-mask pairs from 671 patients across 10 datasets show consistent robustness and strong cross-dataset generalization. An adaptive loss-aggregation strategy further stabilizes optimization and yields additional gains in accuracy and training efficiency. Through extensive validation, MCFNet outperforms existing methods, excelling in organ segmentation and providing reliable image-guided support for computer-aided diagnosis. Our solution aims to improve the precision and safety of radiotherapy and surgery while supporting personalized treatment, advancing modern medical technology. The code has been made available on GitHub: https://github.com/Henry991115/MCFNet.

Multi-scale Cascaded Foundation Model for Whole-body Organs-at-risk Segmentation

TL;DR

MCFNet addresses the challenge of robust whole-body OAR segmentation across diverse imaging protocols by fusing multi-scale features through two complementary backbones: the Sharp Extraction Backbone (SEB) for boundary fidelity and the Flexible Connection Backbone (FCB) for global context. A Linear Attention Transformer (LAT) is embedded in the skip connections to model long-range dependencies efficiently, and an Adaptive-MFA strategy dynamically aggregates multi-scale predictions during training. The architecture includes a Cascaded Skip-connection Module and an Aggregation module that yields a final prediction , enhanced by adaptive loss weighting across four decoder scales. Extensive experiments on ten heterogeneous datasets demonstrate strong cross-dataset generalization and state-of-the-art performance, with evidence of improved boundary accuracy and robustness in clinical scenarios, and the authors provide publicly available code for reproducibility.

Abstract

Accurate segmentation of organs-at-risk (OARs) is vital for safe and precise radiotherapy and surgery. Most existing studies segment only a limited set of organs or regions, lacking a systematic treatment of OARs segmentation. We present a Multi-scale Cascaded Fusion Network (MCFNet) that aggregates features across multiple scales and resolutions. MCFNet consists of a Sharp Extraction Backbone for the downsampling path and a Flexible Connection Backbone for skip-connection fusion, strengthening representation learning in both stages. This design improves boundary localization and preserves fine structures while maintaining computational efficiency, enabling reliable performance even on low-resolution inputs. Experiments on an NVIDIA A6000 GPU using 36,131 image-mask pairs from 671 patients across 10 datasets show consistent robustness and strong cross-dataset generalization. An adaptive loss-aggregation strategy further stabilizes optimization and yields additional gains in accuracy and training efficiency. Through extensive validation, MCFNet outperforms existing methods, excelling in organ segmentation and providing reliable image-guided support for computer-aided diagnosis. Our solution aims to improve the precision and safety of radiotherapy and surgery while supporting personalized treatment, advancing modern medical technology. The code has been made available on GitHub: https://github.com/Henry991115/MCFNet.

Paper Structure

This paper contains 40 sections, 12 equations, 11 figures, 13 tables.

Figures (11)

  • Figure 1: MCFNet is trained on ten diverse datasets to perform whole-body organs-at-risk (OARs) segmentation, covering the head & neck, thorax, abdomen, prostate, and femur regions.
  • Figure 2: Discussion on Model Complexity on the CPCGEA Dataset. (a) Visualization of model performance and parameter size. (b) Visualization of model performance and FLOPs.
  • Figure 3: Illustration of the overall MCFNet architecture. The network is built upon two complementary backbones: the Sharp Extraction Backbone for fine details and the Flexible Connection Backbone for global semantics, which are integrated via a Cascaded Skip-connection Module and an Aggregation Module.
  • Figure 4: (a) Illustration of the Cascaded Skip-connection Module (CSM). (b) Illustration of the Linear Attention Transformer Block (LAT) in the CSM.
  • Figure 5: Illustration of the Aggregation module that merges hierarchical multi-scale features from FCB and SEB into a compact representation supervised by Adaptive-MFA.
  • ...and 6 more figures