Flow Matching for Conditional MRI-CT and CBCT-CT Image Synthesis
Arnela Hadzic, Simon Johannes Joham, Martin Urschler
TL;DR
This work tackles MRI/CBCT to synthetic CT (sCT) generation for MRI-only and CBCT-based adaptive radiotherapy. It introduces a fully 3D Flow Matching (FM) framework conditioned on the input MRI/CBCT, transforming a base Gaussian volume $x_0 \sim \mathcal{N}(0, I)$ into an sCT $x_1$ by integrating a learned velocity field, with conditioning features extracted by a lightweight 3D encoder and a 3D U-Net predicting the velocity. Training is performed separately for MRI→sCT and CBCT→sCT across abdomen, head-and-neck, and thorax using the SynthRAD2025 benchmark; inference solves the ODE $\dot{x}_t = v_\theta(x_t, t\mid c)$ via an RK4 solver with 32 steps starting from $x_0$. Results show accurate global anatomy reconstruction but limited preservation of fine details due to the $128^3$ training resolution, indicating potential gains from 3D patch-based training and latent-space flow models. The approach enables faster, high-quality conditional sCT synthesis that could support MRI-only workflows and in-room CBCT-based adaptive radiotherapy while reducing patient radiation exposure.
Abstract
Generating synthetic CT (sCT) from MRI or CBCT plays a crucial role in enabling MRI-only and CBCT-based adaptive radiotherapy, improving treatment precision while reducing patient radiation exposure. To address this task, we adopt a fully 3D Flow Matching (FM) framework, motivated by recent work demonstrating FM's efficiency in producing high-quality images. In our approach, a Gaussian noise volume is transformed into an sCT image by integrating a learned FM velocity field, conditioned on features extracted from the input MRI or CBCT using a lightweight 3D encoder. We evaluated the method on the SynthRAD2025 Challenge benchmark, training separate models for MRI $\rightarrow$ sCT and CBCT $\rightarrow$ sCT across three anatomical regions: abdomen, head and neck, and thorax. Validation and testing were performed through the challenge submission system. The results indicate that the method accurately reconstructs global anatomical structures; however, preservation of fine details was limited, primarily due to the relatively low training resolution imposed by memory and runtime constraints. Future work will explore patch-based training and latent-space flow models to improve resolution and local structural fidelity.
