Table of Contents
Fetching ...

SYN-LUNGS: Towards Simulating Lung Nodules with Anatomy-Informed Digital Twins for AI Training

Fakrul Islam Tushar, Lavsen Dahal, Cindy McCabe, Fong Chi Ho, Paul Segars, Ehsan Abadi, Kyle J. Lafata, Ehsan Samei, Joseph Y. Lo

TL;DR

SYN-LUNGS addresses data scarcity in lung nodule AI by introducing an anatomy-informed synthetic data framework that combines XCAT3-based digital twins, procedural nodule generation via X-Lesions, and physics-based CT imaging with DukeSim. The approach yields a large, annotated dataset (174 twins, 512 nodules across 1,044 CT scans) and demonstrates improved generalization across detection (+~10%), segmentation (+2–9%), and malignancy classification when trained on clinical plus simulated data. It also enables targeted nodule synthesis through SYN-ControlNet, enhancing controllability of lesion size and placement. This framework offers a scalable path to improve model reliability, especially for rare cases, by integrating anatomical fidelity with realistic imaging physics and controlled augmentation.

Abstract

AI models for lung cancer screening are limited by data scarcity, impacting generalizability and clinical applicability. Generative models address this issue but are constrained by training data variability. We introduce SYN-LUNGS, a framework for generating high-quality 3D CT images with detailed annotations. SYN-LUNGS integrates XCAT3 phantoms for digital twin generation, X-Lesions for nodule simulation (varying size, location, and appearance), and DukeSim for CT image formation with vendor and parameter variability. The dataset includes 3,072 nodule images from 1,044 simulated CT scans, with 512 lesions and 174 digital twins. Models trained on clinical + simulated data outperform clinical only models, achieving 10% improvement in detection, 2-9% in segmentation and classification, and enhanced synthesis. By incorporating anatomy-informed simulations, SYN-LUNGS provides a scalable approach for AI model development, particularly in rare disease representation and improving model reliability.

SYN-LUNGS: Towards Simulating Lung Nodules with Anatomy-Informed Digital Twins for AI Training

TL;DR

SYN-LUNGS addresses data scarcity in lung nodule AI by introducing an anatomy-informed synthetic data framework that combines XCAT3-based digital twins, procedural nodule generation via X-Lesions, and physics-based CT imaging with DukeSim. The approach yields a large, annotated dataset (174 twins, 512 nodules across 1,044 CT scans) and demonstrates improved generalization across detection (+~10%), segmentation (+2–9%), and malignancy classification when trained on clinical plus simulated data. It also enables targeted nodule synthesis through SYN-ControlNet, enhancing controllability of lesion size and placement. This framework offers a scalable path to improve model reliability, especially for rare cases, by integrating anatomical fidelity with realistic imaging physics and controlled augmentation.

Abstract

AI models for lung cancer screening are limited by data scarcity, impacting generalizability and clinical applicability. Generative models address this issue but are constrained by training data variability. We introduce SYN-LUNGS, a framework for generating high-quality 3D CT images with detailed annotations. SYN-LUNGS integrates XCAT3 phantoms for digital twin generation, X-Lesions for nodule simulation (varying size, location, and appearance), and DukeSim for CT image formation with vendor and parameter variability. The dataset includes 3,072 nodule images from 1,044 simulated CT scans, with 512 lesions and 174 digital twins. Models trained on clinical + simulated data outperform clinical only models, achieving 10% improvement in detection, 2-9% in segmentation and classification, and enhanced synthesis. By incorporating anatomy-informed simulations, SYN-LUNGS provides a scalable approach for AI model development, particularly in rare disease representation and improving model reliability.

Paper Structure

This paper contains 14 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: Overview of the SYN-LUNGS workflow, integrating XCAT3, X-Lesion, and DukeSim for digital twin and nodule simulation. Clinical and simulated datasets are used for nodule detection, segmentation, classification, and synthesis, with external evaluations using FROC, DICE, AUC, and qualitative analysis.
  • Figure 2: Simulated lung nodules with varying sizes and imaging conditions. Top: digital human twins model slices with embedded nodules. Middle: simulated CT with different scanner settings. Bottom: zoomed-in nodule views.
  • Figure 3: Dataset Distribution and Detection Performance. (a) CT scan and nodule distribution. (b) Nodule size density plots. (c) FROC curve: clinical (green) vs. clinical + simulated data (brown). (d) Detection results with axial CT slices (top) and zoomed-in views (bottom), showing nodule size and confidence scores.
  • Figure 4: Nodule Segmentation dataset(a) and performance(b-f). (a) Histograms of nodule size distributions across datasets, with mean (red), median (blue), and 5-mm threshold (black) dashed lines for reference.(a-d) Box plots compare Dice scores across models and datasets: (b, c) MSD Task06 (internal) and (d, e) DLCSD24 (external). Left: all nodules; right: clinically significant sizes. Legends show segmented nodule counts. (f) Qualitative examples with nodule diameter and Dice scores.
  • Figure 5: Cancer Classification experiment workflow.(a)Statistical labeling, (b)Statistical labeling evaluation,(c)Dataset distribution,(d)Classification performance AUC and (e)score.
  • ...and 1 more figures