Deep-Motion-Net: GNN-based volumetric organ shape reconstruction from single-view 2D projections

Isuru Wijesinghe; Michael Nix; Arezoo Zakeri; Alireza Hokmabadi; Bashar Al-Qaisieh; Ali Gooya; Zeike A. Taylor

Deep-Motion-Net: GNN-based volumetric organ shape reconstruction from single-view 2D projections

Isuru Wijesinghe, Michael Nix, Arezoo Zakeri, Alireza Hokmabadi, Bashar Al-Qaisieh, Ali Gooya, Zeike A. Taylor

TL;DR

Deep-Motion-Net presents a novel end-to-end graph neural network that reconstructs full 3D organ volumes from a single in-treatment kV projection at arbitrary angles by mapping CNN-derived image features to per-vertex displacements on a patient-specific tetrahedral mesh. The architecture fuses projection-angle information via an angle channel, leverages four feature pooling networks to attach image cues to mesh nodes, and employs a graph attention deformation network to produce smooth, physically plausible volumetric deformations. Training relies on synthetically generated paired data (DRRs transformed to kV style) using SuPReMo-based motion models and a conditional CycleGAN, enabling evaluation on synthetic benchmarks and real kV images from liver cancer patients. Key findings show sub-millimeter mean vertex errors on synthetic data with localized higher peaks and statistically significant improvements over surface-only, fixed-angle baselines on real data, highlighting potential for intra-treatment motion management without fiducial markers or MR-linac imaging. The work offers a scalable path toward inter- and intra-fraction dose adaptation by predicting organ motions from readily available kV imaging, potentially improving therapeutic ratio in radiotherapy.

Abstract

We propose Deep-Motion-Net: an end-to-end graph neural network (GNN) architecture that enables 3D (volumetric) organ shape reconstruction from a single in-treatment kV planar X-ray image acquired at any arbitrary projection angle. Estimating and compensating for true anatomical motion during radiotherapy is essential for improving the delivery of planned radiation dose to target volumes while sparing organs-at-risk, and thereby improving the therapeutic ratio. Achieving this using only limited imaging available during irradiation and without the use of surrogate signals or invasive fiducial markers is attractive. The proposed model learns the mesh regression from a patient-specific template and deep features extracted from kV images at arbitrary projection angles. A 2D-CNN encoder extracts image features, and four feature pooling networks fuse these features to the 3D template organ mesh. A ResNet-based graph attention network then deforms the feature-encoded mesh. The model is trained using synthetically generated organ motion instances and corresponding kV images. The latter is generated by deforming a reference CT volume aligned with the template mesh, creating digitally reconstructed radiographs (DRRs) at required projection angles, and DRR-to-kV style transferring with a conditional CycleGAN model. The overall framework was tested quantitatively on synthetic respiratory motion scenarios and qualitatively on in-treatment images acquired over full scan series for liver cancer patients. Overall mean prediction errors for synthetic motion test datasets were 0.16$\pm$0.13 mm, 0.18$\pm$0.19 mm, 0.22$\pm$0.34 mm, and 0.12$\pm$0.11 mm. Mean peak prediction errors were 1.39 mm, 1.99 mm, 3.29 mm, and 1.16 mm.

Deep-Motion-Net: GNN-based volumetric organ shape reconstruction from single-view 2D projections

TL;DR

Abstract

0.13 mm, 0.18

0.19 mm, 0.22

0.34 mm, and 0.12

0.11 mm. Mean peak prediction errors were 1.39 mm, 1.99 mm, 3.29 mm, and 1.16 mm.

Paper Structure (27 sections, 4 equations, 6 figures, 7 tables)

This paper contains 27 sections, 4 equations, 6 figures, 7 tables.

Introduction
Previous work: 3D shape reconstruction from single-view projections
RGB image-based reconstruction
Projection image-based reconstruction
Recapitulation
Methodology
3D organ shape representation
Model architecture
Incorporating projection angle
2D-CNN configuration
Feature pooling networks
Mesh deformation network
Loss functions
Implementation and training details
Synthetic dataset generation
...and 12 more sections

Figures (6)

Figure 1: Illustration of the Deep-Motion-Net architecture. A 2D-CNN image encoder extracts projection angle-dependent semantic features from an input kV X-ray image. A feature pooling layer comprising four learnable feature pooling networks attaches these features to the appropriate vertices in the patient-specific template mesh. Finally, a graph-attention-based network predicts the corresponding mesh deformation.
Figure 2: Example surrogate signals: original signals associated with the input 4D-CT data (red) and randomly generated variations from these (grey), used in turn to synthesise new motion states. The first and second signals are plotted at the left and right, respectively.
Figure 3: Effect of projection angle on prediction accuracy: box and whisker plots of mean (top) and peak (bottom) prediction errors grouped according to image projection angle (degrees). Each box and whisker shows the distribution of errors for the indicated projection angle using all deformation states in the test set. For clarity of visualisation, angles are further grouped into 10 equal bins covering a full revolution. Results for patients 1 (blue), 2 (yellow), 3 (green), and 4 (red) are shown for each bin.
Figure 4: Visualisations of ground-truth deformed (left column), template (middle column), and estimated deformed (right column) 3D liver shapes. Meshes are overlaid on the deformed 3D-CT volume. Rows 1-3 show, respectively, axial, coronal, and sagittal views. Results are shown for the worst performing test case for patient 1: image projection angle $80.849^\circ$, and deformation state producing highest errors. Contours in the right column indicate the spatial distribution of errors on the surface. Similar results for patients 2-4 are presented in the supplementary material.
Figure 5: Illustration of the process of MI-based assessment of model prediction accuracy.
...and 1 more figures

Deep-Motion-Net: GNN-based volumetric organ shape reconstruction from single-view 2D projections

TL;DR

Abstract

Deep-Motion-Net: GNN-based volumetric organ shape reconstruction from single-view 2D projections

Authors

TL;DR

Abstract

Table of Contents

Figures (6)