Table of Contents
Fetching ...

Physics-Aware Neural Operators for Direct Inversion in 3D Photoacoustic Tomography

Jiayun Wang, Yousuf Aborahama, Arya Khokhar, Yang Zhang, Chuwei Wang, Karteekeya Sastry, Julius Berner, Yilin Luo, Boris Bonev, Zongyi Li, Kamyar Azizzadenesheli, Lihong V. Wang, Anima Anandkumar

TL;DR

This work introduces PANO (PACT imaging neural operator), an end-to-end physics-aware neural operator that generalizes across input sampling densities without retraining-a deep learning architecture that directly learns the inverse mapping from raw sensor measurements to a 3D volumetric image.

Abstract

Learning physics-constrained inverse operators-rather than post-processing physics-based reconstructions-is a broadly applicable strategy for problems with expensive forward models. We demonstrate this principle in three-dimensional photoacoustic computed tomography (3D PACT), where current systems demand dense transducer arrays and prolonged scans, restricting clinical translation. We introduce PANO (PACT imaging neural operator), an end-to-end physics-aware neural operator-a deep learning architecture that generalizes across input sampling densities without retraining-that directly learns the inverse mapping from raw sensor measurements to a 3D volumetric image. Unlike two-step methods that reconstruct then denoise, PANO performs direct inversion in a single pass, jointly embedding physics and data priors. It employs spherical discrete-continuous convolutions to respect hemispherical sensor geometry and Helmholtz equation constraints to ensure physical consistency. PANO reconstructs high-quality images from both simulated and real data across diverse sparse acquisition settings, achieves real-time inference and outperforms the widely-used UBP algorithm by approximately 33 percentage points in cosine similarity on simulated data and 14 percentage points on real phantom data. These results establish a pathway toward more accessible 3D PACT systems for preclinical research, and motivate future in-vivo validation for clinical translation.

Physics-Aware Neural Operators for Direct Inversion in 3D Photoacoustic Tomography

TL;DR

This work introduces PANO (PACT imaging neural operator), an end-to-end physics-aware neural operator that generalizes across input sampling densities without retraining-a deep learning architecture that directly learns the inverse mapping from raw sensor measurements to a 3D volumetric image.

Abstract

Learning physics-constrained inverse operators-rather than post-processing physics-based reconstructions-is a broadly applicable strategy for problems with expensive forward models. We demonstrate this principle in three-dimensional photoacoustic computed tomography (3D PACT), where current systems demand dense transducer arrays and prolonged scans, restricting clinical translation. We introduce PANO (PACT imaging neural operator), an end-to-end physics-aware neural operator-a deep learning architecture that generalizes across input sampling densities without retraining-that directly learns the inverse mapping from raw sensor measurements to a 3D volumetric image. Unlike two-step methods that reconstruct then denoise, PANO performs direct inversion in a single pass, jointly embedding physics and data priors. It employs spherical discrete-continuous convolutions to respect hemispherical sensor geometry and Helmholtz equation constraints to ensure physical consistency. PANO reconstructs high-quality images from both simulated and real data across diverse sparse acquisition settings, achieves real-time inference and outperforms the widely-used UBP algorithm by approximately 33 percentage points in cosine similarity on simulated data and 14 percentage points on real phantom data. These results establish a pathway toward more accessible 3D PACT systems for preclinical research, and motivate future in-vivo validation for clinical translation.

Paper Structure

This paper contains 17 sections, 20 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Overview: The proposed Pano (Photoacoustic imaging neural operator) reconstructs a 3D image (voxel) from photoacoustic radio-frequency (RF) data measurements. a, Schematic diagram of the imaging system, which uses a hemispherical ultrasound (US) transducer array. The target is placed on top of the US detection surface of the transducer array and a laser illuminates the target. The photoacoustic (PA) waves are detected by the sensor for further processing and reconstruction with a data acquisition system. b, The arrangements of the transducer elements with subsampled measurement patterns (to accelerate the imaging): Full, uniformly subsampled measurements at 6$\times$, 10$\times$ in azimuth (top), and limited angle (3$\times$ acceleration) in azimuth and elevation. c, Overall architecture of the proposed deep learning framework Pano for end-to-end 3D reconstruction. A neural operator is used to transform the PA wave $\Psi$ to 3D volumetric image $P$. A neural operator is designed to be agnostic to the sampling rate of the PA wave. As a cycle consistency check, reconstruction is further projected back to PA waves, and a physics loss is used to penalize if the reconstruction's PA wave deviates significantly from the input $\Psi$. d, Reconstruction performance (cosine similarity) and inference speed of different reconstruction algorithms on real experimental data. The proposed neural operator Pano achieves improved reconstruction performance with faster inference time compared to baseline methods, such as deep-learning-based Denoiser choi2023DLPACT and the iterative solver xu2002exact. Compared with the well-adopted state-of-the-art reconstruction method UBP (universal back-projection algorithm) xu2005universal, Pano achieves a 10 percentage points reconstruction performance gain with similar inference time. (Inference setting: $6\times$ uniform subsampling.) e, PA MAP image comparisons of different methods. The proposed Pano outperforms other methods, reconstructing 3D structures with higher fidelity and lower noise. MAP, maximum amplitude projection.
  • Figure 2: Pano design and architecture.a, Conceptual comparison of different methods. 1) Solver-based methods like UBP xu2005universal directly invert the input $\Psi$ with a physical imaging model. 2) Reconstruct-then-denoise method like choi2023DLPACT first inverts the input $\Psi$ with the physics model and then uses a network (e.g. U-Net) to denoise/refine for better reconstruction. 3) The proposed Pano is the first end-to-end method for 3D PACT reconstruction. It directly inverts $\Psi$ with a resolution-agnostic neural operator. Pano is also physics-aware by enforcing the physical model during training. b, The design of the proposed Pano considers the physical model/sensing matrix $A$ of the imaging process $\Psi = AP$, where $A$ is the Helmholtz equation. Specifically, considering the Helmholtz equation is time-independent, different frequency $k_i$ components of the input PA wave $y_{k_i}$ are processed independently first with a local feature learning component, spherical DISCO (discrete-continuous convolution). Spherical DISCO is a neural operator block that mimics spherical convolution and makes Pano agnostic to different subsampling of the input measurement data (See \ref{['fig:disco']}). Multi-frequency features are then combined and fed to the global feature learning module, FNO (Fourier neural operator). FNO will also perform a coordinate transform: from spherical coordinates to Cartesian coordinates. Finally, a multi-scale feature learning module, 3D U-Net, outputs the reconstructed 3D volumetric image $P$.
  • Figure 3: Results on simulated data.a, Visualization of 3D reconstruction of different methods (the proposed method Pano, Denoiser choi2023DLPACT and UBP (universal back-projection) xu2005universal. We visualize with MAP (maximum amplitude projection on the z-axis) and observe that the proposed Pano reconstructs 3D images with high-fidelity. Zoomed-in view is provided on the bottom right of each subfigure for easier visualization. We consider both the uniform subsampling setting and the limited-angle reconstruction ($\frac{1}{3}$ in full elevation) setting, as shown in \ref{['fig:teaser']}b. We use HSV color space as the color coding, where the axial depth is encoded as hue, while the normalized PA (photoacoustic) amplitude is encoded as value. b, Quantitative evaluation (cosine similarity, PSNR, NMSE) across subsampling rates $6{\times}$--$20{\times}$. Pano matches the Denoiser at $6{\times}$ ($-$0.4 percentage points) and progressively outperforms it at higher acceleration, reaching a 14.4 percentage point advantage at $20{\times}$, confirming that end-to-end operator learning degrades more gracefully than the two-step paradigm under aggressive subsampling.
  • Figure 4: Results on real data.a, Visualization of 3D reconstruction of different methods (PA MAP images of the proposed Pano , Denoiser choi2023DLPACT and UBP (universal back-projection) xu2005universal. We consider both the uniform subsampling setting and the limited-angle reconstruction ($\frac{1}{3}$ in full elevation) setting, as shown in \ref{['fig:teaser']}b. Compared to other methods, Pano reconstruct 3D images with improved fidelity-persevering details and less noise. We use the HSV color space, where the axial depth is encoded as hue while the normalized PA (photoacoustic) amplitude is encoded as value. b, Quantitative evaluation (cosine similarity, PSNR, NMSE) on real phantom data. Pano outperforms UBP and the Denoiser at all four subsampling rates, with the largest margin at $15{\times}$ (28.4 percentage points over Denoiser), validating sim-to-real generalizability of the proposed approach.
  • Figure 5: Analysis and ablation study.a, All methods' performance with different subsampling patterns under 3$\times$ acceleration. Pano outperforms baselines on limited azimuth and elevation settings, and is on par with Denoiser choi2023DLPACT under the uniform sampling setting. b, Comparison with iterative solvers xu2002exact on the simulated data. With uniform subsampling, we observe an improved performance of the iterative solver over UBP xu2005universal, at the cost of approximately 10$\times$ slower in inference time. c, Ablation study of different Pano components. Removing the FNO block leads to a dramatic performance drop (55.2%), indicating the power of the global feature learning of the FNO. d, Performance under different DISCO kernel bases at $20{\times}$ subsampling. The Zernike basis achieves the best cosine similarity (77.1%), followed by wavelet and piecewise linear (71.4%). e, Ablation study of the physics loss. Adding the physics loss improves PACT reconstruction performance under different subsampling rates. f, Comparing spherical versus 2D DISCO DISCO. 2D DISCO directly projects the spherical coordinates into a Cartesian grid and leads to 2% performance drop compared to spherical DISCO used in the proposed Pano .
  • ...and 1 more figures