Deep-Motion-Net: GNN-based volumetric organ shape reconstruction from single-view 2D projections
Isuru Wijesinghe, Michael Nix, Arezoo Zakeri, Alireza Hokmabadi, Bashar Al-Qaisieh, Ali Gooya, Zeike A. Taylor
TL;DR
Deep-Motion-Net presents a novel end-to-end graph neural network that reconstructs full 3D organ volumes from a single in-treatment kV projection at arbitrary angles by mapping CNN-derived image features to per-vertex displacements on a patient-specific tetrahedral mesh. The architecture fuses projection-angle information via an angle channel, leverages four feature pooling networks to attach image cues to mesh nodes, and employs a graph attention deformation network to produce smooth, physically plausible volumetric deformations. Training relies on synthetically generated paired data (DRRs transformed to kV style) using SuPReMo-based motion models and a conditional CycleGAN, enabling evaluation on synthetic benchmarks and real kV images from liver cancer patients. Key findings show sub-millimeter mean vertex errors on synthetic data with localized higher peaks and statistically significant improvements over surface-only, fixed-angle baselines on real data, highlighting potential for intra-treatment motion management without fiducial markers or MR-linac imaging. The work offers a scalable path toward inter- and intra-fraction dose adaptation by predicting organ motions from readily available kV imaging, potentially improving therapeutic ratio in radiotherapy.
Abstract
We propose Deep-Motion-Net: an end-to-end graph neural network (GNN) architecture that enables 3D (volumetric) organ shape reconstruction from a single in-treatment kV planar X-ray image acquired at any arbitrary projection angle. Estimating and compensating for true anatomical motion during radiotherapy is essential for improving the delivery of planned radiation dose to target volumes while sparing organs-at-risk, and thereby improving the therapeutic ratio. Achieving this using only limited imaging available during irradiation and without the use of surrogate signals or invasive fiducial markers is attractive. The proposed model learns the mesh regression from a patient-specific template and deep features extracted from kV images at arbitrary projection angles. A 2D-CNN encoder extracts image features, and four feature pooling networks fuse these features to the 3D template organ mesh. A ResNet-based graph attention network then deforms the feature-encoded mesh. The model is trained using synthetically generated organ motion instances and corresponding kV images. The latter is generated by deforming a reference CT volume aligned with the template mesh, creating digitally reconstructed radiographs (DRRs) at required projection angles, and DRR-to-kV style transferring with a conditional CycleGAN model. The overall framework was tested quantitatively on synthetic respiratory motion scenarios and qualitatively on in-treatment images acquired over full scan series for liver cancer patients. Overall mean prediction errors for synthetic motion test datasets were 0.16$\pm$0.13 mm, 0.18$\pm$0.19 mm, 0.22$\pm$0.34 mm, and 0.12$\pm$0.11 mm. Mean peak prediction errors were 1.39 mm, 1.99 mm, 3.29 mm, and 1.16 mm.
