Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

K. Aditya Mohan; Massimiliano Ferrucci; Chuck Divin; Garrett A. Stevenson; Hyojin Kim

Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

K. Aditya Mohan, Massimiliano Ferrucci, Chuck Divin, Garrett A. Stevenson, Hyojin Kim

TL;DR

This work introduces Distributed Implicit Neural Representation (DINR) for 4D CT reconstruction, addressing the ill-posedness and limited-angle artifacts that arise when reconstructing rapidly changing scenes. DINR learns a continuous 4D representation of the object by optimizing a neural network that maps space-time coordinates to the local linear attenuation coefficient, using a forward model that samples a subset of coordinates along rays to compute projections. A distributed stochastic training scheme enables training on large HPC clusters, achieving high-fidelity reconstructions at terabyte-scale data sizes and substantially lower per-GPU memory requirements than voxel-grid INRs. Across experimental and simulated datasets, DINR outperforms state-of-the-art methods in PSNR/SSIM and visual fidelity, demonstrating strong scalability and the potential for fast, high-resolution 4D CT in dynamic in-situ experiments.

Abstract

4D time-space reconstruction of dynamic events or deforming objects using X-ray computed tomography (CT) is an important inverse problem in non-destructive evaluation. Conventional back-projection based reconstruction methods assume that the object remains static for the duration of several tens or hundreds of X-ray projection measurement images (reconstruction of consecutive limited-angle CT scans). However, this is an unrealistic assumption for many in-situ experiments that causes spurious artifacts and inaccurate morphological reconstructions of the object. To solve this problem, we propose to perform a 4D time-space reconstruction using a distributed implicit neural representation (DINR) network that is trained using a novel distributed stochastic training algorithm. Our DINR network learns to reconstruct the object at its output by iterative optimization of its network parameters such that the measured projection images best match the output of the CT forward measurement model. We use a forward measurement model that is a function of the DINR outputs at a sparsely sampled set of continuous valued 4D object coordinates. Unlike previous neural representation architectures that forward and back propagate through dense voxel grids that sample the object's entire time-space coordinates, we only propagate through the DINR at a small subset of object coordinates in each iteration resulting in an order-of-magnitude reduction in memory and compute for training. DINR leverages distributed computation across several compute nodes and GPUs to produce high-fidelity 4D time-space reconstructions. We use both simulated parallel-beam and experimental cone-beam X-ray CT datasets to demonstrate the superior performance of our approach.

Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

TL;DR

Abstract

Paper Structure (16 sections, 9 equations, 18 figures, 1 table)

This paper contains 16 sections, 9 equations, 18 figures, 1 table.

Introduction
Our Approach
Results
Experimental Data Evaluation
Simulated Data Evaluation
Discussion
Conclusion
Acknowledgments
Linear Projection
Network Architecture
Training Parameters
Supplementary Document
Preprocessing of Experimental Data
Training Loop
Ablation Studies
...and 1 more sections

Figures (18)

Figure 1: Schematic of our distributed implicit neural representation (DINR) approach to 4DCT reconstruction. The projection estimate $\bar{p}_i$ is a function of the linear attenuation coefficient (LAC) at coordinates $r_{i,j}$ inside the orange colored pyramidal volume shown in (a, b). The output of the DINR network, $\mathcal{M}(r_{i,j}; \gamma)$, gives the LAC at coordinate $r_{i,j}$. (c) shows the loss function that is local to each process and computed over a small subset $\Omega_k$ of projection indices. For the projection image at the $m^{th}$ view, the time $t_i$ for each projection pixel is the same value of $T_m$ since the time instant for all pixels in an image is the same. (d) is our distributed approach to training of the DINR network.
Figure 2: 4DCT of samples under compression in a Deben stage that is mounted in a cone-beam X-ray CT system. (a) and (b) show the log-pile and SiC samples, respectively, used for the 4DCT scans. (c) shows the Deben stage used for in-situ compression of the samples. (d) shows the Zeiss Xradia 510 Versa cone-beam X-ray imaging system used for 4DCT acquisitions. (e) and (f) show the X-ray projection images of (a) and (b) respectively at different view angles. $T_m$ in the column labels of (e, f) indicates the time of the projection images at the $m^{th}$ view. In (e, f), the $2^{nd}$ row shows magnified views of the first-row images in the region denoted by the rectangular box at time $T_0$.
Figure 3: 4DCT reconstruction of the log-pile sample (Fig. \ref{['fig:debendata']} (a)). (a) shows the 3D ISO surface of the 4D reconstruction and the cross-section images of the LAC using the high-resolution DINR at various times. (b) is a reconstruction comparison of the cross-section images of the LAC between the conventional 4D FDK and our DINR approach. The PSNR/SSIM values are embedded in the images of (b). The high-resolution DINR images are the best visual match for the ground-truth while also producing the highest PSNR and SSIM. Fig. \ref{['fig:log-pile-fdk-rwls-dinr']} in the supplementary document demonstrates the advantage of DINR compared to the Regularized Weighted Least Squares (RWLS) algorithm LEAPCT with total variation regularization in 3D.
Figure 4: 4DCT reconstruction of the SiC sample (Fig. \ref{['fig:debendata']} (b)). (a) shows the 3D AMIP volumes of the high resolution 4D DINR reconstruction with clearly resolved propagation of cracks over time. (b) shows the high and low resolution DINR reconstruction of a cross-axial slice at different times. The low resolution DINR is unable to clearly resolve the cracks. The high resolution DINR produces the best reconstruction that clearly resolves the cracks in all cases. The first two images and the last four images in (c) show the 2-frame FDK and the 4-frame FDK reconstructions respectively. The 2-frame FDK suffers from motion blur while the 4-frame FDK has substantial limited angle artifacts.
Figure 5: Plot of the total loss function, $L\left(\Omega_*, \gamma\right)$, vs. iterations for the log-pile experimental data reconstruction using High-Res. DINR. Each iteration is approximately $0.214$ seconds when training on $128$ Nvidia V100 GPUs. We observe that the convergence of the loss function is highly stable. The time for one 3D inference, i.e., the time taken for a single 3D volume reconstruction (Table \ref{['tab:exp_params']}), is approximately $38$ minutes using $48$ GPUs.
...and 13 more figures

Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

TL;DR

Abstract

Distributed Stochastic Optimization of a Neural Representation Network for Time-Space Tomography Reconstruction

Authors

TL;DR

Abstract

Table of Contents

Figures (18)