Enhancing Free-hand 3D Photoacoustic and Ultrasound Reconstruction using Deep Learning
SiYeoul Lee, SeonHo Kim, Minkyung Seo, SeongKyu Park, Salehin Imrus, Kambaluru Ashok, DongEon Lee, Chunsu Park, SeonYeong Lee, Jiye Kim, Jae-Heung Yoo, MinWoo Kim
TL;DR
This work tackles the challenge of sensorless freehand 3D PAUS reconstruction by introducing MoGLo-Net, a motion-based learning network that combines a global-local self-attention module with a correlation volume to robustly estimate six-degree-of-freedom motion from sequential B-mode frames. The method integrates a patch-wise correlation operation, dual RNN-based motion estimators, and a triad of specialized losses (MMAE, correlation loss, and margin triplet) to achieve accurate, drift-resistant 3D reconstructions, extended to Doppler and photoacoustic imaging for vascular visualization. Extensive experiments on in-house and open datasets demonstrate state-of-the-art performance, real-time inference, and clear ablation-driven insights into the contributions of global-local attention, correlation information, and motion-based supervision. The approach holds potential for clinically practical freehand PAUS imaging, enabling comprehensive 3D vascular visualization without external tracking hardware and with applicability across ultrasound, Doppler, and photoacoustic modalities.
Abstract
This study introduces a motion-based learning network with a global-local self-attention module (MoGLo-Net) to enhance 3D reconstruction in handheld photoacoustic and ultrasound (PAUS) imaging. Standard PAUS imaging is often limited by a narrow field of view and the inability to effectively visualize complex 3D structures. The 3D freehand technique, which aligns sequential 2D images for 3D reconstruction, faces significant challenges in accurate motion estimation without relying on external positional sensors. MoGLo-Net addresses these limitations through an innovative adaptation of the self-attention mechanism, which effectively exploits the critical regions, such as fully-developed speckle area or high-echogenic tissue area within successive ultrasound images to accurately estimate motion parameters. This facilitates the extraction of intricate features from individual frames. Additionally, we designed a patch-wise correlation operation to generate a correlation volume that is highly correlated with the scanning motion. A custom loss function was also developed to ensure robust learning with minimized bias, leveraging the characteristics of the motion parameters. Experimental evaluations demonstrated that MoGLo-Net surpasses current state-of-the-art methods in both quantitative and qualitative performance metrics. Furthermore, we expanded the application of 3D reconstruction technology beyond simple B-mode ultrasound volumes to incorporate Doppler ultrasound and photoacoustic imaging, enabling 3D visualization of vasculature. The source code for this study is publicly available at: https://github.com/guhong3648/US3D
