Table of Contents
Fetching ...

Image-Guided Navigation of a Robotic Ultrasound Probe for Autonomous Spinal Sonography Using a Shadow-aware Dual-Agent Framework

Keyu Li, Yangxin Xu, Jian Wang, Dong Ni, Li Liu, Max Q. -H. Meng

TL;DR

A novel dual-agent framework is proposed that integrates a reinforcement learning (RL) agent and a deep learning (DL) agent to jointly determine the movement of the US probe based on the real-time US images, in order to mimic the decision-making process of an expert sonographer to achieve autonomous standard view acquisitions in spinal sonography.

Abstract

Ultrasound (US) imaging is commonly used to assist in the diagnosis and interventions of spine diseases, while the standardized US acquisitions performed by manually operating the probe require substantial experience and training of sonographers. In this work, we propose a novel dual-agent framework that integrates a reinforcement learning (RL) agent and a deep learning (DL) agent to jointly determine the movement of the US probe based on the real-time US images, in order to mimic the decision-making process of an expert sonographer to achieve autonomous standard view acquisitions in spinal sonography. Moreover, inspired by the nature of US propagation and the characteristics of the spinal anatomy, we introduce a view-specific acoustic shadow reward to utilize the shadow information to implicitly guide the navigation of the probe toward different standard views of the spine. Our method is validated in both quantitative and qualitative experiments in a simulation environment built with US data acquired from 17 volunteers. The average navigation accuracy toward different standard views achieves 5.18mm/5.25deg and 12.87mm/17.49deg in the intra- and inter-subject settings, respectively. The results demonstrate that our method can effectively interpret the US images and navigate the probe to acquire multiple standard views of the spine.

Image-Guided Navigation of a Robotic Ultrasound Probe for Autonomous Spinal Sonography Using a Shadow-aware Dual-Agent Framework

TL;DR

A novel dual-agent framework is proposed that integrates a reinforcement learning (RL) agent and a deep learning (DL) agent to jointly determine the movement of the US probe based on the real-time US images, in order to mimic the decision-making process of an expert sonographer to achieve autonomous standard view acquisitions in spinal sonography.

Abstract

Ultrasound (US) imaging is commonly used to assist in the diagnosis and interventions of spine diseases, while the standardized US acquisitions performed by manually operating the probe require substantial experience and training of sonographers. In this work, we propose a novel dual-agent framework that integrates a reinforcement learning (RL) agent and a deep learning (DL) agent to jointly determine the movement of the US probe based on the real-time US images, in order to mimic the decision-making process of an expert sonographer to achieve autonomous standard view acquisitions in spinal sonography. Moreover, inspired by the nature of US propagation and the characteristics of the spinal anatomy, we introduce a view-specific acoustic shadow reward to utilize the shadow information to implicitly guide the navigation of the probe toward different standard views of the spine. Our method is validated in both quantitative and qualitative experiments in a simulation environment built with US data acquired from 17 volunteers. The average navigation accuracy toward different standard views achieves 5.18mm/5.25deg and 12.87mm/17.49deg in the intra- and inter-subject settings, respectively. The results demonstrate that our method can effectively interpret the US images and navigate the probe to acquire multiple standard views of the spine.

Paper Structure

This paper contains 24 sections, 12 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: (a) Lumbar spine anatomy and (b) US acquisitions of three standard views of the lumbar spine, i.e., PSL: paramedian sagittal lamina view, PSAP: paramededian sagittal articular process view, and TSP: transverse spinal process view of the spine. The left column illustrates the corresponding probe poses spineUS. The middle column presents the B-mode images acquired by a clinician from a volunteer. The right column shows the corresponding images acquired with the same probe poses from the virtual patient in our simulation.
  • Figure 2: Overview of the presented method for autonomous standard view acquisition in robotic spinal sonography. (a) shows the real-world system configuration, where a US probe is controlled by a robotic arm to scan the patient in the prone position. (b) Given the acquired US image as input, the RL agent selects the best navigation action based on the SonoQNet to control the 5-DOF movement of the probe. The US confidence map is computed from the US image to calculate the (c) view-specific acoustic shadow reward, which is used in combination with the navigation reward to train the RL agent. Meanwhile, (d) a pre-trained DL agent recognizes the standard views from the US image and jointly determines the movement of the probe under the safety-related environment constraints. The objective of the proposed framework is to automatically acquire three standard views of the lumbar spine (PSL, PSAP and TSP views).
  • Figure 3: Illustration of the simulation environment for robotic spinal sonography. The virtual patients in our simulation are reconstructed 3D volumes of the lumbar spine, and the virtual probe is modeled as a commonly used 2D probe with a square field-of-view. The imaging plane of the probe is set as the $y$-$z$ plane of the probe frame $\{P\}$. The world frame $\{W\}$ is attached to the robot base. The current probe pose and the goal probe pose associated with the target standard view are represented by the transformations $\prescript{W}{P}{\mathbf{T}}$ and $\prescript{W}{G}{\mathbf{T}}$, and the corresponding US images are denoted by $I$ and $I_g$, respectively.
  • Figure 4: Schematic illustration of the SonoQNet architecture for navigation action selection. The input are $4$ recently acquired US images of size $150\times 150$. The output are the predicted Q-values for the $10$ navigation actions, and the agent will select the action with the highest Q-value. The feature extractor contains $13$ convolutional layers (blue), each followed by batch normalization and ReLU activation. Max pooling (yellow) is performed after the first $4$ convolutional blocks with a filter size of $2\times2$ and a stride of $2$. The size of each feature map is denoted above the blocks. The output of the last convolution+BN block (green) are $10$ class score maps associated with the $10$ navigation actions, which are finally aggregated by global average pooling (GAP) to approximate the Q-values.
  • Figure 5: (a)-(c) show the US images and the corresponding confidence maps of the PSL view, PSAP view and TSP view of the spine. The acoustic shadow can be seen in the images as area below the yellow dotted line. (d) illustrates the proposed ROI candidates to quantitatively measure the shadow area.
  • ...and 7 more figures