Table of Contents
Fetching ...

UltraHiT: A Hierarchical Transformer Architecture for Generalizable Internal Carotid Artery Robotic Ultrasonography

Teng Wang, Haojun Jiang, Yuxuan Wang, Zhenguo Sun, Xiangjie Yan, Xiang Li, Gao Huang

TL;DR

UltraHiT tackles the challenging problem of autonomous ICA longitudinal ultrasonography by introducing a hierarchical transformer that couples high-level variation assessment with two specialized low-level executors. By gating between a knowledge-based standardized path and a data-driven adaptive corrector using history-rich causal transformers, the approach effectively handles ICA anatomical variability. The authors contribute the first large ICA scanning dataset (164 trajectories, 72K samples from 28 subjects) and demonstrate a 95% success rate on unseen individuals, with strong real-world robustness to initialization and motion. The framework advances robotic ultrasound toward clinically practical ICA scanning and provides a scalable blueprint for handling anatomical variability in medical robotics.

Abstract

Carotid ultrasound is crucial for the assessment of cerebrovascular health, particularly the internal carotid artery (ICA). While previous research has explored automating carotid ultrasound, none has tackled the challenging ICA. This is primarily due to its deep location, tortuous course, and significant individual variations, which greatly increase scanning complexity. To address this, we propose a Hierarchical Transformer-based decision architecture, namely UltraHiT, that integrates high-level variation assessment with low-level action decision. Our motivation stems from conceptualizing individual vascular structures as morphological variations derived from a standard vascular model. The high-level module identifies variation and switches between two low-level modules: an adaptive corrector for variations, or a standard executor for normal cases. Specifically, both the high-level module and the adaptive corrector are implemented as causal transformers that generate predictions based on the historical scanning sequence. To ensure generalizability, we collected the first large-scale ICA scanning dataset comprising 164 trajectories and 72K samples from 28 subjects of both genders. Based on the above innovations, our approach achieves a 95% success rate in locating the ICA on unseen individuals, outperforming baselines and demonstrating its effectiveness. Our code will be released after acceptance.

UltraHiT: A Hierarchical Transformer Architecture for Generalizable Internal Carotid Artery Robotic Ultrasonography

TL;DR

UltraHiT tackles the challenging problem of autonomous ICA longitudinal ultrasonography by introducing a hierarchical transformer that couples high-level variation assessment with two specialized low-level executors. By gating between a knowledge-based standardized path and a data-driven adaptive corrector using history-rich causal transformers, the approach effectively handles ICA anatomical variability. The authors contribute the first large ICA scanning dataset (164 trajectories, 72K samples from 28 subjects) and demonstrate a 95% success rate on unseen individuals, with strong real-world robustness to initialization and motion. The framework advances robotic ultrasound toward clinically practical ICA scanning and provides a scalable blueprint for handling anatomical variability in medical robotics.

Abstract

Carotid ultrasound is crucial for the assessment of cerebrovascular health, particularly the internal carotid artery (ICA). While previous research has explored automating carotid ultrasound, none has tackled the challenging ICA. This is primarily due to its deep location, tortuous course, and significant individual variations, which greatly increase scanning complexity. To address this, we propose a Hierarchical Transformer-based decision architecture, namely UltraHiT, that integrates high-level variation assessment with low-level action decision. Our motivation stems from conceptualizing individual vascular structures as morphological variations derived from a standard vascular model. The high-level module identifies variation and switches between two low-level modules: an adaptive corrector for variations, or a standard executor for normal cases. Specifically, both the high-level module and the adaptive corrector are implemented as causal transformers that generate predictions based on the historical scanning sequence. To ensure generalizability, we collected the first large-scale ICA scanning dataset comprising 164 trajectories and 72K samples from 28 subjects of both genders. Based on the above innovations, our approach achieves a 95% success rate in locating the ICA on unseen individuals, outperforming baselines and demonstrating its effectiveness. Our code will be released after acceptance.

Paper Structure

This paper contains 15 sections, 20 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview. (a) Robotic scanning vs. manual scanning. (b) Characteristics of ICA vs. CCA. (c) Our hierarchical transformer architecture vs. previous works.
  • Figure 2: Internal carotid artery's anatomy and variability. The left panel shows the standard anatomy of the ICA gray1878anatomy, while the right panel demonstrates the significant variability in the position and course of the ICA within the population.
  • Figure 3: Hierarchical transformer architecture. (a) Overview of hierarchical architecture. The high-level module makes semantic decisions, while the low-level module executes physical actions in the real world. (b) The corrective gate and adaptive corrector, process image-action sequences through a causal transformer. (c) The stop model architecture. (d) The standardized path executor, a knowledge-based policy designed using anatomical prior knowledge.
  • Figure 4: Hardware and control configuration of the system.
  • Figure 5: Failure reasons for different method.
  • ...and 3 more figures