Table of Contents
Fetching ...

UltraDP: Generalizable Carotid Ultrasound Scanning with Force-Aware Diffusion Policy

Ruoqu Chen, Xiangjie Yan, Kangchen Lv, Gao Huang, Zheng Li, Xiang Li

TL;DR

This work addresses the generalization bottleneck in autonomous carotid ultrasound scanning by introducing UltraDP, a diffusion-policy framework that fuses multi-modal inputs—ultrasound images $U$, wrist-camera data $I$, contact wrench $w$, and probe pose $x$—to predict safe, smooth navigation actions. A specialized guidance module centers the carotid artery in the image, while a high-frequency hybrid force-impedance controller ensures safe physical interaction with subjects. The authors construct a large real-world dataset (210 scans, 460k samples) and demonstrate that UltraDP generalizes well to unseen subjects, outperforming rule-based and BC baselines in real-world trials. The approach advances practical ultrasound robotics by improving generalization, safety, and efficiency in human-in-the-loop scanning, with potential to scale to broader anatomical regions and imaging tasks.

Abstract

Ultrasound scanning is a critical imaging technique for real-time, non-invasive diagnostics. However, variations in patient anatomy and complex human-in-the-loop interactions pose significant challenges for autonomous robotic scanning. Existing ultrasound scanning robots are commonly limited to relatively low generalization and inefficient data utilization. To overcome these limitations, we present UltraDP, a Diffusion-Policy-based method that receives multi-sensory inputs (ultrasound images, wrist camera images, contact wrench, and probe pose) and generates actions that are fit for multi-modal action distributions in autonomous ultrasound scanning of carotid artery. We propose a specialized guidance module to enable the policy to output actions that center the artery in ultrasound images. To ensure stable contact and safe interaction between the robot and the human subject, a hybrid force-impedance controller is utilized to drive the robot to track such trajectories. Also, we have built a large-scale training dataset for carotid scanning comprising 210 scans with 460k sample pairs from 21 volunteers of both genders. By exploring our guidance module and DP's strong generalization ability, UltraDP achieves a 95% success rate in transverse scanning on previously unseen subjects, demonstrating its effectiveness.

UltraDP: Generalizable Carotid Ultrasound Scanning with Force-Aware Diffusion Policy

TL;DR

This work addresses the generalization bottleneck in autonomous carotid ultrasound scanning by introducing UltraDP, a diffusion-policy framework that fuses multi-modal inputs—ultrasound images , wrist-camera data , contact wrench , and probe pose —to predict safe, smooth navigation actions. A specialized guidance module centers the carotid artery in the image, while a high-frequency hybrid force-impedance controller ensures safe physical interaction with subjects. The authors construct a large real-world dataset (210 scans, 460k samples) and demonstrate that UltraDP generalizes well to unseen subjects, outperforming rule-based and BC baselines in real-world trials. The approach advances practical ultrasound robotics by improving generalization, safety, and efficiency in human-in-the-loop scanning, with potential to scale to broader anatomical regions and imaging tasks.

Abstract

Ultrasound scanning is a critical imaging technique for real-time, non-invasive diagnostics. However, variations in patient anatomy and complex human-in-the-loop interactions pose significant challenges for autonomous robotic scanning. Existing ultrasound scanning robots are commonly limited to relatively low generalization and inefficient data utilization. To overcome these limitations, we present UltraDP, a Diffusion-Policy-based method that receives multi-sensory inputs (ultrasound images, wrist camera images, contact wrench, and probe pose) and generates actions that are fit for multi-modal action distributions in autonomous ultrasound scanning of carotid artery. We propose a specialized guidance module to enable the policy to output actions that center the artery in ultrasound images. To ensure stable contact and safe interaction between the robot and the human subject, a hybrid force-impedance controller is utilized to drive the robot to track such trajectories. Also, we have built a large-scale training dataset for carotid scanning comprising 210 scans with 460k sample pairs from 21 volunteers of both genders. By exploring our guidance module and DP's strong generalization ability, UltraDP achieves a 95% success rate in transverse scanning on previously unseen subjects, demonstrating its effectiveness.

Paper Structure

This paper contains 15 sections, 8 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Demonstration of carotid artery ultrasound scanning task. Top: traditional sonographer. Bottom: the ultrasound robot with the proposed method UltraDP, which takes ultrasound images, depth images, contact wrench, and probe pose as input and outputs the desired pose and contact wrench.
  • Figure 2: (a) Probe positions and the corresponding ultrasound images for transverse section and at the end of scanning. (b) Illustration of ultrasound imaging principle.
  • Figure 3: The structure of UltraDP, including data collection, pretrain, navigation system and control system. In the navigation system module, only the inference process is illustrated, where the "ResNet" block within the dashed box means the trained network on the pretrained parameters, and the "ResNet" block with a "frozen" sign within the dashed box equals the one in the pretrain module.
  • Figure 4: Demonstration of the experiment setup, which consists of a Franka manipulator, an ultrasound machine with a probe, an ATI mini 40 force/torque sensor connected between the arm flange and the ultrasound probe, and a RealSense D405 camera mounted on the arm flange.
  • Figure 5: Real-World Experiment - Snapshots and ultrasound images, the red lines are the output of our pretrained regressor. No red line means the network can not detect the artery position. First row: UltraDP. (a) The policy began, and the artery was on the right of the image; (b) The policy output actions to guide the robot to make the artery center while going upwards; (c) The policy detected the bifurcation of the artery; (d) The external and internal arteries were clear in the image, and the scanning was over. Second row: Baseline, behavior cloning. (e)-(h): The BC policy (baseline 1) did not center the artery; And in the end, the policy drove the probe away from the neck, showing the unsatisfying generalization ability. Third row: Baseline, visual serving. (i)-(l) The VS controller (baseline 2) had the ability to center the artery; however, because some parameters like offset_z did not suit the female, the probe detached her neck, and the regressor couldn't work when the image was incomplete; at last the probe kept going up, losing the image and hit her in the jaw.
  • ...and 3 more figures