Table of Contents
Fetching ...

Efficient Deep Model-Based Optoacoustic Image Reconstruction

Christoph Dehner, Guillaume Zahnd

TL;DR

The paper addresses the need for real-time, high-quality optoacoustic reconstruction with lower hardware costs in multispectral optoacoustic tomography (MSOT). It introduces EfficientDeepMB, a lightweight encoder–decoder network based on EfficientNet, using MBConv inverted residual blocks and a U-Net decoder, trained with synthetic data to approximate model-based reconstructions and compiled to ONNX for speed. Compared with the baseline DeepMB, EfficientDeepMB achieves similar accuracy (data residuals, MAE, PSNR, SSIM) but runs on mid-range GPUs at up to 59 Hz (live feedback), representing a 3–5x speedup on the same class of devices. This work demonstrates the potential for MSOT miniaturization and broader clinical translation by reducing computational and hardware requirements without compromising image quality.

Abstract

Clinical adoption of multispectral optoacoustic tomography necessitates improvements of the image quality available in real-time, as well as a reduction in the scanner financial cost. Deep learning approaches have recently unlocked the reconstruction of high-quality optoacoustic images in real-time. However, currently used deep neural network architectures require powerful graphics processing units to infer images at sufficiently high frame-rates, consequently greatly increasing the price tag. Herein we propose EfficientDeepMB, a relatively lightweight (17M parameters) network architecture achieving high frame-rates on medium-sized graphics cards with no noticeable downgrade in image quality. EfficientDeepMB is built upon DeepMB, a previously established deep learning framework to reconstruct high-quality images in real-time, and upon EfficientNet, a network architectures designed to operate of mobile devices. We demonstrate the performance of EfficientDeepMB in terms of reconstruction speed and accuracy using a large and diverse dataset of in vivo optoacoustic scans. EfficientDeepMB is about three to five times faster than DeepMB: deployed on a medium-sized NVIDIA RTX A2000 Ada, EfficientDeepMB reconstructs images at speeds enabling live image feedback (59Hz) while DeepMB fails to meets the real-time inference threshold (14Hz). The quantitative difference between the reconstruction accuracy of EfficientDeepMB and DeepMB is marginal (data residual norms of 0.1560 vs. 0.1487, mean absolute error of 0.642 vs. 0.745). There are no perceptible qualitative differences between images inferred with the two reconstruction methods.

Efficient Deep Model-Based Optoacoustic Image Reconstruction

TL;DR

The paper addresses the need for real-time, high-quality optoacoustic reconstruction with lower hardware costs in multispectral optoacoustic tomography (MSOT). It introduces EfficientDeepMB, a lightweight encoder–decoder network based on EfficientNet, using MBConv inverted residual blocks and a U-Net decoder, trained with synthetic data to approximate model-based reconstructions and compiled to ONNX for speed. Compared with the baseline DeepMB, EfficientDeepMB achieves similar accuracy (data residuals, MAE, PSNR, SSIM) but runs on mid-range GPUs at up to 59 Hz (live feedback), representing a 3–5x speedup on the same class of devices. This work demonstrates the potential for MSOT miniaturization and broader clinical translation by reducing computational and hardware requirements without compromising image quality.

Abstract

Clinical adoption of multispectral optoacoustic tomography necessitates improvements of the image quality available in real-time, as well as a reduction in the scanner financial cost. Deep learning approaches have recently unlocked the reconstruction of high-quality optoacoustic images in real-time. However, currently used deep neural network architectures require powerful graphics processing units to infer images at sufficiently high frame-rates, consequently greatly increasing the price tag. Herein we propose EfficientDeepMB, a relatively lightweight (17M parameters) network architecture achieving high frame-rates on medium-sized graphics cards with no noticeable downgrade in image quality. EfficientDeepMB is built upon DeepMB, a previously established deep learning framework to reconstruct high-quality images in real-time, and upon EfficientNet, a network architectures designed to operate of mobile devices. We demonstrate the performance of EfficientDeepMB in terms of reconstruction speed and accuracy using a large and diverse dataset of in vivo optoacoustic scans. EfficientDeepMB is about three to five times faster than DeepMB: deployed on a medium-sized NVIDIA RTX A2000 Ada, EfficientDeepMB reconstructs images at speeds enabling live image feedback (59Hz) while DeepMB fails to meets the real-time inference threshold (14Hz). The quantitative difference between the reconstruction accuracy of EfficientDeepMB and DeepMB is marginal (data residual norms of 0.1560 vs. 0.1487, mean absolute error of 0.642 vs. 0.745). There are no perceptible qualitative differences between images inferred with the two reconstruction methods.
Paper Structure (8 sections, 3 figures, 3 tables)

This paper contains 8 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: EfficientDeepMB network architecture. In the encoding pathway, the seven blocks inspired from EfficientNet are shown in color. The numbers in brackets indicate the tensors shape (channels, height, width). Conv: block including a 2D convolution, batch normalization, and SiLU activation. MBConv6: block including an inverted residual block (namely, a pointwise convolution block with expension factor 6, and a depthwise grouped convolution block), a squeeze-and-excitation block with reduction factor 4, a pointwise projection block, and a residual connection block. MBConv1: similar as MBConv6, albeit without a pointwise convolution block. DoubleConv: traditional U-Net decoder block, composed of a chain of two Conv blocks. The size of all convolution kernels is $3\times3$. Concatenation is applied channel-wise.
  • Figure 2: Comparison of the inference time between EfficientDeepMB and DeepMB, for different graphics cards. The dashed line represents the threshold for real-time imaging.
  • Figure 3: Representative examples of optoacoustic images from the in vivo test dataset for different anatomical locations, reconstructed with EfficientDeepMB (a, f, k, p), DeepMB (b, g, l, q), and model-based (MB) (c, h, m, r). The last two columns show the mean absolute difference between EfficientDeepMB and DeepMB (d, i, n, s), as well as between EfficientDeepMB and MB (e, j, o, t). For each row, the value within brackets indicates the laser wavelength.