RAG-RUSS: A Retrieval-Augmented Robotic Ultrasound for Autonomous Carotid Examination

Dianye Huang; Ziping Cong; Nassir Navab; Zhongliang Jiang

RAG-RUSS: A Retrieval-Augmented Robotic Ultrasound for Autonomous Carotid Examination

Dianye Huang, Ziping Cong, Nassir Navab, Zhongliang Jiang

TL;DR

The RAG-RUSS is introduced, an interpretable framework capable of performing a full carotid examination in accordance with the clinical workflow while explicitly explaining both the current stage and the next planned action, and incorporates retrieval-augmented generation to enhance generalization and reduce dependence on large-scale training datasets.

Abstract

Robotic ultrasound (US) has recently attracted increasing attention as a means to overcome the limitations of conventional US examinations, such as the strong operator dependence. However, the decision-making process of existing methods is often either rule-based or relies on end-to-end learning models that operate as black boxes. This has been seen as a main limit for clinical acceptance and raises safety concerns for widespread adoption in routine practice. To tackle this challenge, we introduce the RAG-RUSS, an interpretable framework capable of performing a full carotid examination in accordance with the clinical workflow while explicitly explaining both the current stage and the next planned action. Furthermore, given the scarcity of medical data, we incorporate retrieval-augmented generation to enhance generalization and reduce dependence on large-scale training datasets. The method was trained on data acquired from 28 volunteers, while an additional four volumetric scans recorded from previously unseen volunteers were reserved for testing. The results demonstrate that the method can explain the current scanning stage and autonomously plan probe motions to complete the carotid examination, encompassing both transverse and longitudinal planes.

RAG-RUSS: A Retrieval-Augmented Robotic Ultrasound for Autonomous Carotid Examination

TL;DR

Abstract

Paper Structure (22 sections, 3 equations, 6 figures, 4 tables)

This paper contains 22 sections, 3 equations, 6 figures, 4 tables.

Introduction
Examination Workflow and Data Preparation
Carotid Ultrasound Examination Workflow
Clinical Workflow for Carotid Artery Examination
Scanning Stage Definition
Ultrasound Simulation Using Human Volumetric Data
Human Volumetric Data Acquisition
Expert Carotid Examination Demonstration Generation
Dataset Structure
Method
Structure of RAG-RUSS
Large Language Model
Ultrasound Image Encoder
Cross-Modality Projector
RAG-based In-Context Learning
...and 7 more sections

Figures (6)

Figure 1: An illustration showing a representative use case of the RAG-RUSS system. Given a clinician’s query, RAG-RUSS retrieves similar annotated scans via RAG, then identifies the stage, generates an explanation, and predicts the next_API. This API is performed by the robotic arm, and the acquired US images are fed back into the system, forming a closed loop that enables autonomous, interpretable carotid artery scanning.
Figure 2: Overview of the carotid ultrasound scanning workflow and data acquisition/processing pipeline. Left: presents eight predefined scanning stages of the carotid artery examination. Right: US images and the probe’s pose trajectory are recorded and subsequently compounded into a 3D volumetric data for use in the simulation environment.
Figure 3: Architecture and runtime inference of the proposed RAG-RUSS for carotid artery US scanning. Left: System architecture comprising an LLM backbone (Vicuna v1.5, 7B zheng2023judging), the medical vision foundation model (PubMedCLIP-ViT-B/32 eslami2021does), and a RAG module that retrieves similar scanning contexts to support decision-making. Right: Inputs/outputs of RAG-RUSS and the signal flow when deploying it for carotid US scanning. For more details on inputs/outputs and architecture of RAG-RUSS, refer to Fig. \ref{['fig:prompts']} and Section \ref{['sec:method']}-B, respectively.
Figure 4: An illustrutive multi-turn QA example of RAG-RUSS. The inputs are: i). System prompt that specifies the task description along with the predefined executable APIs and scanning stages. ii). Two retrieved scanning contexts via the RAG component. iii). The current query involves the two input US images and the previously predicted stage, and then poses three questions sequentially. RAG-RUSS then outputs: i). the current stage, ii). a short explanation, and iii). the next_API to execute.
Figure 5: Scheme for building an RAG database. The scanning context includes two US frames, the previous stage, and associated VQA annotations.
...and 1 more figures

RAG-RUSS: A Retrieval-Augmented Robotic Ultrasound for Autonomous Carotid Examination

TL;DR

Abstract

RAG-RUSS: A Retrieval-Augmented Robotic Ultrasound for Autonomous Carotid Examination

Authors

TL;DR

Abstract

Table of Contents

Figures (6)