USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner
Mingcong Chen, Siqi Fan, Guanglin Cao, Yun-hui Liu, Hongbin Liu
TL;DR
USPilot tackles the global shortage of skilled sonographers by introducing an embodied robotic ultrasound system guided by an LLM-enhanced graph planner. The framework uses a semantic router to interpret user queries and route to either ultrasound knowledge adapters or the LLMEG planning module, which combines a GNN encoder-decoder with a subgraph generator to plan API sequences for autonomous scanning. Key contributions include the LLM-enhanced Graph Neural Network for API selection, adapter-based ultrasound knowledge embedding, and a real-world robotic ultrasound demonstration showing autonomous scanning capabilities and QA responses. The results indicate improved task planning accuracy and practical potential for unmanned medical imaging, while highlighting challenges in generalization to unseen body parts and dependence on a fixed API set, suggesting avenues for future multimodal, real-time control integration and back-end generalization.
Abstract
In the era of Large Language Models (LLMs), embodied artificial intelligence presents transformative opportunities for robotic manipulation tasks. Ultrasound imaging, a widely used and cost-effective medical diagnostic procedure, faces challenges due to the global shortage of professional sonographers. To address this issue, we propose USPilot, an embodied robotic assistant ultrasound system powered by an LLM-based framework to enable autonomous ultrasound acquisition. USPilot is designed to function as a virtual sonographer, capable of responding to patients' ultrasound-related queries and performing ultrasound scans based on user intent. By fine-tuning the LLM, USPilot demonstrates a deep understanding of ultrasound-specific questions and tasks. Furthermore, USPilot incorporates an LLM-enhanced Graph Neural Network (GNN) to manage ultrasound robotic APIs and serve as a task planner. Experimental results show that the LLM-enhanced GNN achieves unprecedented accuracy in task planning on public datasets. Additionally, the system demonstrates significant potential in autonomously understanding and executing ultrasound procedures. These advancements bring us closer to achieving autonomous and potentially unmanned robotic ultrasound systems, addressing critical resource gaps in medical imaging.
