Table of Contents
Fetching ...

Breaking the Latency Barrier: Synergistic Perception and Control for High-Frequency 3D Ultrasound Servoing

Yizhao Qian, Yujie Zhu, Jiayuan Luo, Li Liu, Yixuan Yuan, Guochen Ning, Hongen Liao

TL;DR

This study tackles the latency barrier in dynamic Robotic Ultrasound Systems by introducing a synergistic co-design of perception and control. It combines a Decoupled Dual-Stream Perception Network with a Single-Step Flow Policy to achieve a closed-loop rate of over 60 Hz for 3D translational servoing, validated on a dynamic phantom and in-vivo on a human volunteer. The approach demonstrates robust tracking of complex 3D trajectories, fast re-acquisition from large displacements, and efficient sim-to-real transfer using only 50 real trajectories. These results suggest a viable path toward high-bandwidth autonomous ultrasound scanning in dynamic clinical environments, with future work extending to full 6-DoF control and broader clinical validation.

Abstract

Real-time tracking of dynamic targets amidst large-scale, high-frequency disturbances remains a critical unsolved challenge in Robotic Ultrasound Systems (RUSS), primarily due to the end-to-end latency of existing systems. This paper argues that breaking this latency barrier requires a fundamental shift towards the synergistic co-design of perception and control. We realize it in a novel framework with two tightly-coupled contributions: (1) a Decoupled Dual-Stream Perception Network that robustly estimates 3D translational state from 2D images at high frequency, and (2) a Single-Step Flow Policy that generates entire action sequences in one inference pass, bypassing the iterative bottleneck of conventional policies. This synergy enables a closed-loop control frequency exceeding 60Hz. On a dynamic phantom, our system not only tracks complex 3D trajectories with a mean error below 6.5mm but also demonstrates robust re-acquisition from over 170mm displacement. Furthermore, it can track targets at speeds of 102mm/s, achieving a terminal error below 1.7mm. Moreover, in-vivo experiments on a human volunteer validate the framework's effectiveness and robustness in a realistic clinical setting. Our work presents a RUSS holistically architected to unify high-bandwidth tracking with large-scale repositioning, a critical step towards robust autonomy in dynamic clinical environments.

Breaking the Latency Barrier: Synergistic Perception and Control for High-Frequency 3D Ultrasound Servoing

TL;DR

This study tackles the latency barrier in dynamic Robotic Ultrasound Systems by introducing a synergistic co-design of perception and control. It combines a Decoupled Dual-Stream Perception Network with a Single-Step Flow Policy to achieve a closed-loop rate of over 60 Hz for 3D translational servoing, validated on a dynamic phantom and in-vivo on a human volunteer. The approach demonstrates robust tracking of complex 3D trajectories, fast re-acquisition from large displacements, and efficient sim-to-real transfer using only 50 real trajectories. These results suggest a viable path toward high-bandwidth autonomous ultrasound scanning in dynamic clinical environments, with future work extending to full 6-DoF control and broader clinical validation.

Abstract

Real-time tracking of dynamic targets amidst large-scale, high-frequency disturbances remains a critical unsolved challenge in Robotic Ultrasound Systems (RUSS), primarily due to the end-to-end latency of existing systems. This paper argues that breaking this latency barrier requires a fundamental shift towards the synergistic co-design of perception and control. We realize it in a novel framework with two tightly-coupled contributions: (1) a Decoupled Dual-Stream Perception Network that robustly estimates 3D translational state from 2D images at high frequency, and (2) a Single-Step Flow Policy that generates entire action sequences in one inference pass, bypassing the iterative bottleneck of conventional policies. This synergy enables a closed-loop control frequency exceeding 60Hz. On a dynamic phantom, our system not only tracks complex 3D trajectories with a mean error below 6.5mm but also demonstrates robust re-acquisition from over 170mm displacement. Furthermore, it can track targets at speeds of 102mm/s, achieving a terminal error below 1.7mm. Moreover, in-vivo experiments on a human volunteer validate the framework's effectiveness and robustness in a realistic clinical setting. Our work presents a RUSS holistically architected to unify high-bandwidth tracking with large-scale repositioning, a critical step towards robust autonomy in dynamic clinical environments.

Paper Structure

This paper contains 26 sections, 7 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Overview of the proposed high-frequency visual servoing. (a) Challenge: maintaining the ultrasound view under significant and unpredictable disturbances. (b) Control objective: aligning the live video stream with the target image using robotic manipulation. (c) Outcome: rapid reduction of positional errors (x, y) and maximization of image similarity, quantified by normalized cross-correlation (NCC).
  • Figure 2: Overview of our proposed high-frequency US servoing framework. The system takes a live image stream and a goal image as input. The Vision Front-end, composed of a Decoupled Dual-Stream Network and an Adaptive-UKF, estimates the 3D translational error. This state information is fed to the Flow Policy Network, which generates a short-horizon motion plan executed by the robotic arm in the Physical Environment.
  • Figure 3: The architecture of our Decoupled Dual-Stream Perception Network. The Geometric Stream uses a cost volume to estimate in-plane motion (X & Z axis) based on low-level feature. Concurrently, the Semantic Stream infers out-of-plane motion (Y axis) by interpreting higher-level feature.
  • Figure 4: Conceptual comparison of policy inference processes. (a) Diffusion Policies rely on an iterative denoising process, requiring multiple steps to generate an action. (b) Flow Policy enables single-step inference, drastically reducing latency and enabling high-frequency control.
  • Figure 5: Overview of the experimental setup, showing the UR3e manipulator, the CIRS phantom and the US system.
  • ...and 4 more figures