Table of Contents
Fetching ...

A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT

Dazhou Guo, Zhanghexuan Ji, Yanzhou Su, Dandan Zheng, Heng Guo, Puyang Wang, Ke Yan, Yirui Wang, Qinji Yu, Zi Li, Minfeng Xu, Jianfeng Zhang, Haoshen Li, Jia Ge, Tsung-Ying Ho, Bing-Shen Huang, Tashan Ai, Kuaile Zhao, Na Shen, Qifeng Wang, Yun Bian, Tingyu Wu, Peng Du, Hua Zhang, Feng-Ming Kong, Alan L. Yuille, Cher Heng Tan, Chunyan Miao, Perry J. Pickhardt, Senxiang Yan, Ronald M. Summers, Le Lu, Dakai Jin, Xianghua Ye

TL;DR

CL-Net presents a continual learning-based framework for accurate and generalizable segmentation of an extremely fine-grained whole-body anatomy set in CT. It combines a frozen general encoder with many stratified, pruned decoders, a body-part guided merge, and EMA-based updates to learn from dozens of partially labeled datasets while avoiding catastrophic forgetting. Across internal and external tests, CL-Net achieves state-of-the-art or competitive accuracy with substantially smaller models than ensemble baselines, and it scales to hundreds of anatomies with efficient decoder pruning. The approach enables universal CT segmentation suitable for downstream oncology and chronic-disease tasks, outperforming SAM-style foundations and multi-dataset baselines in precision, stability, and efficiency.

Abstract

Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized clinical expertise, and the time required to finish the task. To this end, we proposed a novel continual learning-driven CT model that can segment complete anatomies presented using dozens of previously partially labeled datasets, dynamically expanding its capacity to segment new ones without compromising previously learned organ knowledge. Existing multi-dataset approaches are not able to dynamically segment new anatomies without catastrophic forgetting and would encounter optimization difficulty or infeasibility when segmenting hundreds of anatomies across the whole range of body regions. Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies. Composed of a universal encoder, multiple optimized and pruned decoders, CL-Net is developed using 13,952 CT scans from 20 public and 16 private high-quality partially labeled CT datasets of various vendors, different contrast phases, and pathologies. Extensive evaluation demonstrates that CL-Net consistently outperforms the upper limit of an ensemble of 36 specialist nnUNets trained per dataset with the complexity of 5% model size and significantly surpasses the segmentation accuracy of recent leading Segment Anything-style medical image foundation models by large margins. Our continual learning-driven CL-Net model would lay a solid foundation to facilitate many downstream tasks of oncology and chronic diseases using the most widely adopted CT imaging.

A Continual Learning-driven Model for Accurate and Generalizable Segmentation of Clinically Comprehensive and Fine-grained Whole-body Anatomies in CT

TL;DR

CL-Net presents a continual learning-based framework for accurate and generalizable segmentation of an extremely fine-grained whole-body anatomy set in CT. It combines a frozen general encoder with many stratified, pruned decoders, a body-part guided merge, and EMA-based updates to learn from dozens of partially labeled datasets while avoiding catastrophic forgetting. Across internal and external tests, CL-Net achieves state-of-the-art or competitive accuracy with substantially smaller models than ensemble baselines, and it scales to hundreds of anatomies with efficient decoder pruning. The approach enables universal CT segmentation suitable for downstream oncology and chronic-disease tasks, outperforming SAM-style foundations and multi-dataset baselines in precision, stability, and efficiency.

Abstract

Precision medicine in the quantitative management of chronic diseases and oncology would be greatly improved if the Computed Tomography (CT) scan of any patient could be segmented, parsed and analyzed in a precise and detailed way. However, there is no such fully annotated CT dataset with all anatomies delineated for training because of the exceptionally high manual cost, the need for specialized clinical expertise, and the time required to finish the task. To this end, we proposed a novel continual learning-driven CT model that can segment complete anatomies presented using dozens of previously partially labeled datasets, dynamically expanding its capacity to segment new ones without compromising previously learned organ knowledge. Existing multi-dataset approaches are not able to dynamically segment new anatomies without catastrophic forgetting and would encounter optimization difficulty or infeasibility when segmenting hundreds of anatomies across the whole range of body regions. Our single unified CT segmentation model, CL-Net, can highly accurately segment a clinically comprehensive set of 235 fine-grained whole-body anatomies. Composed of a universal encoder, multiple optimized and pruned decoders, CL-Net is developed using 13,952 CT scans from 20 public and 16 private high-quality partially labeled CT datasets of various vendors, different contrast phases, and pathologies. Extensive evaluation demonstrates that CL-Net consistently outperforms the upper limit of an ensemble of 36 specialist nnUNets trained per dataset with the complexity of 5% model size and significantly surpasses the segmentation accuracy of recent leading Segment Anything-style medical image foundation models by large margins. Our continual learning-driven CL-Net model would lay a solid foundation to facilitate many downstream tasks of oncology and chronic diseases using the most widely adopted CT imaging.

Paper Structure

This paper contains 60 sections, 11 equations, 9 figures, 53 tables, 1 algorithm.

Figures (9)

  • Figure 1: | Dataset fingerprints. Overview of dsn datasets used for model development and internal validation, including 16 private (P) and 20 public (D) datasets. Datasets are categorized into head-neck (blue box), chest (green box), and abdomen (orange box) groups, based on the primary body region covered by their target organs, expect for totalseg and body-linkmed-1, which span the entire body. For each dataset, the number of training CT scans, number of target organs, and vertical range of [format/long=]bpr scores of CT scans and foreground labels are provided. The bpr score represents the relative vertical position of each slice in a CT scan, normalized from the bottom of pelvis (0) to the top of head (1). Additional dataset fingerprint details are available in Supplementary Sec. \ref{['sec:dataset_details']}.
  • Figure 2: | Illustration of the learning process and architecture of CL-Net.A. clnet can be trained or updated from both pl and cs settings, with the pre-trained ge being frozen. In the partial label segmentation setting, with simultaneous access to datasets of different body parts, the model directly learns different decoders to segment whole-body organs. In the continual segmentation setting, with sequentially available new datasets and no access to previous ones, lightweight decoders of corresponding anatomy structures are added or updated, enabling segmentation of all learned organs without forgetting. B. An overview of the clnet framework: a GE for feature extraction, multiple decoders for organ-wise segmentation, a lth-based decoder pruning module, and a prediction merging module.
  • Figure 3: | The decoder-wise pruning rates and DSC score differences of cln_u36 after pruning. Evaluation of parameter pruning rates ($\mathcal{T}$, %) and DSC score differences ($\Delta$DSC, %) for decn decoders between cln_u36_unprn and cln_u36. Decoders, except 'Body', are grouped into Head & Neck (28 decoders), Chest (31), Abdomen (19), Bone (11), LNS (2), and gtv (9). Yellow/blue bars (left axis) represent positive/negative $\Delta$DSC for each decoder post-pruning, while red cross markers (right axis) indicate decoder-wise $\mathcal{T}$. Results show minimal $\Delta$DSC (mostly within $\pm0.1\%$) and consistently high $\mathcal{T}$ (above 90% for most decoders), highlighting the efficiency of pruning with negligible impact on segmentation performance. Detailed organ-wise metrics and decoder-wise pruning rates are provided in Supplementary Sec. \ref{['sec:ablation_dec']} and \ref{['tab:ds36_hn']}--\ref{['tab:ds36_gtv']}.
  • Figure 4: | Comparison of cln_c5 with popular cs methods. Step-wise segmentation performance of cln_c5, MiB, PLOP, CSCLIP, and nnUNet (upper bound) is evaluated across four cs orders (each row for each order) on five public datasets covering various body parts. Columns S0-S4 show the forgetting curves for each method in each dataset and order. The column 'DSC-Param#' compares mean DSC scores against the parameter sizes of final models for all methods in each order. cs orders of the five datasets are detailed at the bottom. cln_c5 demonstrates no forgetting of previously learned tasks during continual learning across all orders and achieves the highest mean DSC scores with much smaller model size comparable to the ensemble of nnUNet. The detailed numerical metrics are provided in Table \ref{['tab:css_res_avg']}, \ref{['tab:css_comp_res']}, and \ref{['tab:css_totalseg_organ']}--\ref{['tab:css_kits']}.
  • Figure 5: | Qualitative visualization of clnet whole-body anatomy structure segmentation. The first two rows depict the segmentation results of head & neck organs and corresponding lymph node stations, respectively. The third and fourth rows show the segmentation of chest organs and corresponding lymph node stations. The fifth row illustrates the segmentation of abdominal organs, while the final row shows bone segmentation. For better illustration, some organs (such as ChestWall) has been excluded, and certain sub-group organs, such as lung lobes and heart atria and ventricles, have been merged and rendered semi-transparent.
  • ...and 4 more figures