Transforming Surgical Interventions with Embodied Intelligence for Ultrasound Robotics
Huan Xu, Jinlin Wu, Guanglin Cao, Zhen Chen, Zhen Lei, Hongbin Liu
TL;DR
The paper tackles the problem of limited instruction understanding and dynamic execution in ultrasound robotics. It presents an Ultrasound Embodied Intelligence system that fuses ultrasound robots with large language models, augmented by Ultrasound Domain Knowledge Augmenting, Ultrasound Assistant Prompt, and a ReAct-inspired Robot Dynamic Execution loop. Through ablation studies and model comparisons on synthetic data, the approach shows significant gains in initiating API calls and completing scanning tasks, with Mixtral-8x7B-Instruct-v0.1 achieving strong early-step performance. The work demonstrates that integrating domain-specific knowledge with structured prompts and real-time execution can enable more autonomous, high-quality ultrasound scans and streamline medical workflows.
Abstract
Ultrasonography has revolutionized non-invasive diagnostic methodologies, significantly enhancing patient outcomes across various medical domains. Despite its advancements, integrating ultrasound technology with robotic systems for automated scans presents challenges, including limited command understanding and dynamic execution capabilities. To address these challenges, this paper introduces a novel Ultrasound Embodied Intelligence system that synergistically combines ultrasound robots with large language models (LLMs) and domain-specific knowledge augmentation, enhancing ultrasound robots' intelligence and operational efficiency. Our approach employs a dual strategy: firstly, integrating LLMs with ultrasound robots to interpret doctors' verbal instructions into precise motion planning through a comprehensive understanding of ultrasound domain knowledge, including APIs and operational manuals; secondly, incorporating a dynamic execution mechanism, allowing for real-time adjustments to scanning plans based on patient movements or procedural errors. We demonstrate the effectiveness of our system through extensive experiments, including ablation studies and comparisons across various models, showcasing significant improvements in executing medical procedures from verbal commands. Our findings suggest that the proposed system improves the efficiency and quality of ultrasound scans and paves the way for further advancements in autonomous medical scanning technologies, with the potential to transform non-invasive diagnostics and streamline medical workflows.
