Table of Contents
Fetching ...

USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner

Mingcong Chen, Siqi Fan, Guanglin Cao, Yun-hui Liu, Hongbin Liu

TL;DR

USPilot tackles the global shortage of skilled sonographers by introducing an embodied robotic ultrasound system guided by an LLM-enhanced graph planner. The framework uses a semantic router to interpret user queries and route to either ultrasound knowledge adapters or the LLMEG planning module, which combines a GNN encoder-decoder with a subgraph generator to plan API sequences for autonomous scanning. Key contributions include the LLM-enhanced Graph Neural Network for API selection, adapter-based ultrasound knowledge embedding, and a real-world robotic ultrasound demonstration showing autonomous scanning capabilities and QA responses. The results indicate improved task planning accuracy and practical potential for unmanned medical imaging, while highlighting challenges in generalization to unseen body parts and dependence on a fixed API set, suggesting avenues for future multimodal, real-time control integration and back-end generalization.

Abstract

In the era of Large Language Models (LLMs), embodied artificial intelligence presents transformative opportunities for robotic manipulation tasks. Ultrasound imaging, a widely used and cost-effective medical diagnostic procedure, faces challenges due to the global shortage of professional sonographers. To address this issue, we propose USPilot, an embodied robotic assistant ultrasound system powered by an LLM-based framework to enable autonomous ultrasound acquisition. USPilot is designed to function as a virtual sonographer, capable of responding to patients' ultrasound-related queries and performing ultrasound scans based on user intent. By fine-tuning the LLM, USPilot demonstrates a deep understanding of ultrasound-specific questions and tasks. Furthermore, USPilot incorporates an LLM-enhanced Graph Neural Network (GNN) to manage ultrasound robotic APIs and serve as a task planner. Experimental results show that the LLM-enhanced GNN achieves unprecedented accuracy in task planning on public datasets. Additionally, the system demonstrates significant potential in autonomously understanding and executing ultrasound procedures. These advancements bring us closer to achieving autonomous and potentially unmanned robotic ultrasound systems, addressing critical resource gaps in medical imaging.

USPilot: An Embodied Robotic Assistant Ultrasound System with Large Language Model Enhanced Graph Planner

TL;DR

USPilot tackles the global shortage of skilled sonographers by introducing an embodied robotic ultrasound system guided by an LLM-enhanced graph planner. The framework uses a semantic router to interpret user queries and route to either ultrasound knowledge adapters or the LLMEG planning module, which combines a GNN encoder-decoder with a subgraph generator to plan API sequences for autonomous scanning. Key contributions include the LLM-enhanced Graph Neural Network for API selection, adapter-based ultrasound knowledge embedding, and a real-world robotic ultrasound demonstration showing autonomous scanning capabilities and QA responses. The results indicate improved task planning accuracy and practical potential for unmanned medical imaging, while highlighting challenges in generalization to unseen body parts and dependence on a fixed API set, suggesting avenues for future multimodal, real-time control integration and back-end generalization.

Abstract

In the era of Large Language Models (LLMs), embodied artificial intelligence presents transformative opportunities for robotic manipulation tasks. Ultrasound imaging, a widely used and cost-effective medical diagnostic procedure, faces challenges due to the global shortage of professional sonographers. To address this issue, we propose USPilot, an embodied robotic assistant ultrasound system powered by an LLM-based framework to enable autonomous ultrasound acquisition. USPilot is designed to function as a virtual sonographer, capable of responding to patients' ultrasound-related queries and performing ultrasound scans based on user intent. By fine-tuning the LLM, USPilot demonstrates a deep understanding of ultrasound-specific questions and tasks. Furthermore, USPilot incorporates an LLM-enhanced Graph Neural Network (GNN) to manage ultrasound robotic APIs and serve as a task planner. Experimental results show that the LLM-enhanced GNN achieves unprecedented accuracy in task planning on public datasets. Additionally, the system demonstrates significant potential in autonomously understanding and executing ultrasound procedures. These advancements bring us closer to achieving autonomous and potentially unmanned robotic ultrasound systems, addressing critical resource gaps in medical imaging.

Paper Structure

This paper contains 13 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: The overview of USPilot: Answer the sonographer's normal medical questions or perform an automated ultrasound scan.
  • Figure 2: The structure of USPilot: (1) The semantic router recognizes the user’s intent. (2) If the intent is an executable task, LLMEG is invoked using the cached, unadapted Transformer to select potential APIs. (3) The LLM-based subgraph generator reorders the selected APIs into a directed graph.
  • Figure 3: The structure of LLMEG: Select a toolchain $G^*$ from $G$ based on the textual information provided by $I$.
  • Figure 4: The dynamic routing mechanism switches the forward path among adapters according to the user’s instructions.
  • Figure 5: Categorical distribution on organs and numbers of used APIs
  • ...and 3 more figures