Table of Contents
Fetching ...

Surf2CT: Cascaded 3D Flow Matching Models for Torso 3D CT Synthesis from Skin Surface

Siyeop Yoon, Yujin Oh, Pengfei Jin, Sifan Song, Matthew Tivnan, Dufan Wu, Xiang Li, Quanzheng Li

TL;DR

Surf2CT addresses the challenge of generating clinically meaningful internal anatomy by producing 3D torso CT volumes from external surface scans and basic demographics. It introduces a three-stage cascaded pipeline based on conditional flow matching and a 3D-adapted EDM2 backbone that maps $f_{ ext{partial}} \rightarrow f_{ ext{full}} \rightarrow X_{ ext{low}} \rightarrow X_{ ext{high}}$, i.e., surface completion, coarse CT synthesis, and high-resolution refinement. Trained on $3{,}198$ torso CTs from MGH and AutoPET, Surf2CT achieves strong anatomical fidelity with organ volumes within a few percent and robust body composition correlations, while markedly improving partial-surface surface reconstruction metrics. This approach enables non-invasive, radiography-free internal imaging with potential for home-based screening and personalized clinical assessments, though limitations include training-data bias toward cancer populations and sensitivity to real-world scanner noise, underscoring the need for broader validation and fairness considerations.

Abstract

We present Surf2CT, a novel cascaded flow matching framework that synthesizes full 3D computed tomography (CT) volumes of the human torso from external surface scans and simple demographic data (age, sex, height, weight). This is the first approach capable of generating realistic volumetric internal anatomy images solely based on external body shape and demographics, without any internal imaging. Surf2CT proceeds through three sequential stages: (1) Surface Completion, reconstructing a complete signed distance function (SDF) from partial torso scans using conditional 3D flow matching; (2) Coarse CT Synthesis, generating a low-resolution CT volume from the completed SDF and demographic information; and (3) CT Super-Resolution, refining the coarse volume into a high-resolution CT via a patch-wise conditional flow model. Each stage utilizes a 3D-adapted EDM2 backbone trained via flow matching. We trained our model on a combined dataset of 3,198 torso CT scans (approximately 1.13 million axial slices) sourced from Massachusetts General Hospital (MGH) and the AutoPET challenge. Evaluation on 700 paired torso surface-CT cases demonstrated strong anatomical fidelity: organ volumes exhibited small mean percentage differences (range from -11.1% to 4.4%), and muscle/fat body composition metrics matched ground truth with strong correlation (range from 0.67 to 0.96). Lung localization had minimal bias (mean difference -2.5 mm), and surface completion significantly improved metrics (Chamfer distance: from 521.8 mm to 2.7 mm; Intersection-over-Union: from 0.87 to 0.98). Surf2CT establishes a new paradigm for non-invasive internal anatomical imaging using only external data, opening opportunities for home-based healthcare, preventive medicine, and personalized clinical assessments without the risks associated with conventional imaging techniques.

Surf2CT: Cascaded 3D Flow Matching Models for Torso 3D CT Synthesis from Skin Surface

TL;DR

Surf2CT addresses the challenge of generating clinically meaningful internal anatomy by producing 3D torso CT volumes from external surface scans and basic demographics. It introduces a three-stage cascaded pipeline based on conditional flow matching and a 3D-adapted EDM2 backbone that maps , i.e., surface completion, coarse CT synthesis, and high-resolution refinement. Trained on torso CTs from MGH and AutoPET, Surf2CT achieves strong anatomical fidelity with organ volumes within a few percent and robust body composition correlations, while markedly improving partial-surface surface reconstruction metrics. This approach enables non-invasive, radiography-free internal imaging with potential for home-based screening and personalized clinical assessments, though limitations include training-data bias toward cancer populations and sensitivity to real-world scanner noise, underscoring the need for broader validation and fairness considerations.

Abstract

We present Surf2CT, a novel cascaded flow matching framework that synthesizes full 3D computed tomography (CT) volumes of the human torso from external surface scans and simple demographic data (age, sex, height, weight). This is the first approach capable of generating realistic volumetric internal anatomy images solely based on external body shape and demographics, without any internal imaging. Surf2CT proceeds through three sequential stages: (1) Surface Completion, reconstructing a complete signed distance function (SDF) from partial torso scans using conditional 3D flow matching; (2) Coarse CT Synthesis, generating a low-resolution CT volume from the completed SDF and demographic information; and (3) CT Super-Resolution, refining the coarse volume into a high-resolution CT via a patch-wise conditional flow model. Each stage utilizes a 3D-adapted EDM2 backbone trained via flow matching. We trained our model on a combined dataset of 3,198 torso CT scans (approximately 1.13 million axial slices) sourced from Massachusetts General Hospital (MGH) and the AutoPET challenge. Evaluation on 700 paired torso surface-CT cases demonstrated strong anatomical fidelity: organ volumes exhibited small mean percentage differences (range from -11.1% to 4.4%), and muscle/fat body composition metrics matched ground truth with strong correlation (range from 0.67 to 0.96). Lung localization had minimal bias (mean difference -2.5 mm), and surface completion significantly improved metrics (Chamfer distance: from 521.8 mm to 2.7 mm; Intersection-over-Union: from 0.87 to 0.98). Surf2CT establishes a new paradigm for non-invasive internal anatomical imaging using only external data, opening opportunities for home-based healthcare, preventive medicine, and personalized clinical assessments without the risks associated with conventional imaging techniques.

Paper Structure

This paper contains 21 sections, 14 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Overview of the proposed cascaded generative pipeline. Our framework consists of three sequential stages: (1) Surface completion using conditional flow matching to estimate a full signed distance function (SDF) from partial scans; (2) coarse CT volume synthesis (at 8 mm isotropic resolution) conditioned on the completed SDF and demographic attributes; and (3) patch-wise CT super-resolution to refine the coarse CT into a detailed high-resolution volume (2 mm isotropic).
  • Figure 2: Qualitative comparison between original CT scans and Surf2CT-generated 3D volumes.
  • Figure 3: Body composition for two subjects comparing original CT and Surf2CT, highlighting consistent muscle, subcutaneous and visceral fat distributions across varying body types.
  • Figure 4: Qualitative visualization of surface completion. Top row illustrates volume-rendered CT alongside surface reconstruction results for partial input, restored, and original surfaces. Bottom row shows corresponding axial slices of normalized Signed Distance Functions (SDFs).
  • Figure 5: Quantitative (Bland-Altman, regression) and qualitative evaluation demonstrating accurate lung localization. The right panel visually compares lung localization in a representative subject with marked reference points indicating lung boundary positions.
  • ...and 4 more figures