Sample-Efficient Learning with Online Expert Correction for Autonomous Catheter Steering in Endovascular Bifurcation Navigation

Hao Wang; Tianliang Yao; Bo Lu; Zhiqiang Pei; Liu Dong; Lei Ma; Peng Qi

Sample-Efficient Learning with Online Expert Correction for Autonomous Catheter Steering in Endovascular Bifurcation Navigation

Hao Wang, Tianliang Yao, Bo Lu, Zhiqiang Pei, Liu Dong, Lei Ma, Peng Qi

TL;DR

Results indicate that combining sample-efficient RL with online expert correction enables reliable and accurate catheter steering, particularly in anatomically challenging bifurcation scenarios critical for endovascular navigation.

Abstract

Robot-assisted endovascular intervention offers a safe and effective solution for remote catheter manipulation, reducing radiation exposure while enabling precise navigation. Reinforcement learning (RL) has recently emerged as a promising approach for autonomous catheter steering; however, conventional methods suffer from sparse reward design and reliance on static vascular models, limiting their sample efficiency and generalization to intraoperative variations. To overcome these challenges, this paper introduces a sample-efficient RL framework with online expert correction for autonomous catheter steering in endovascular bifurcation navigation. The proposed framework integrates three key components: (1) A segmentation-based pose estimation module for accurate real-time state feedback, (2) A fuzzy controller for bifurcation-aware orientation adjustment, and (3) A structured reward generator incorporating expert priors to guide policy learning. By leveraging online expert correction, the framework reduces exploration inefficiency and enhances policy robustness in complex vascular structures. Experimental validation on a robotic platform using a transparent vascular phantom demonstrates that the proposed approach achieves convergence in 123 training episodes -- a 25.9% reduction compared to the baseline Soft Actor-Critic (SAC) algorithm -- while reducing average positional error to 83.8% of the baseline. These results indicate that combining sample-efficient RL with online expert correction enables reliable and accurate catheter steering, particularly in anatomically challenging bifurcation scenarios critical for endovascular navigation.

Sample-Efficient Learning with Online Expert Correction for Autonomous Catheter Steering in Endovascular Bifurcation Navigation

TL;DR

Abstract

Paper Structure (18 sections, 16 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 16 equations, 5 figures, 1 table, 1 algorithm.

Introduction
Methodology
System Overview
Problem Formulation
Catheter Modeling with Online Expert Correction Pose Mapping for Robotic Control
Skeleton Extraction
Control Input Mapping
Fuzzy Control based on Expert Correction
Expert Experience Guided Trajectory Optimization
Exploration-Oriented Reinforcement Learning with Soft Actor–Critic Algorithm
Expert-Guided Policy Shaping via Generative Adversarial Imitation Learning
Hybrid Reward Scheduling for Balancing Exploration and Imitation
Experiments and Results
Implementation Details
Model Performance Evaluation
...and 3 more sections

Figures (5)

Figure 1: Illustration of endovascular navigation enhanced by expert knowledge and behavior modeling. The framework utilizes intraoperative imaging to detect the real-time position and orientation of the catheter tip. By combining expert procedural patterns with reinforcement learning strategies, the system dynamically adjusts navigation to ensure safe and efficient traversal through vascular bifurcations. This hybrid approach improves accuracy and reliability in reaching target anatomical sites during complex interventions. The schematic was created using BioRender (https://biorender.com).
Figure 2: Expert-in-the-loop catheter navigation framework integrating reinforcement learning and fuzzy control. (a) Robotic catheterization setup and agent–environment interaction: the agent observes the vascular state $s_t$ and outputs action $a_t$ to command catheter translation, rotation, and gripping. (b) Policy learning that combines Soft Actor–Critic (SAC) with Generative Adversarial Imitation Learning (GAIL): mini-batches from a replay buffer train an actor–critic with twin critics ($Q_1$, $Q_2$). A discriminator supplies an expert reward shaped by prior knowledge, and the policy is optimized using a weighted sum of SAC and GAIL rewards ($w_{\text{SAC}}$, $w_{\text{GAIL}}$). Bifurcation detection triggers the expert-in-the-loop module. (c) Online fuzzy pose correction at bifurcations: an expert selects a target pose; a U-Net-based segmentation extracts the catheter mask, followed by smoothing/erosion and geometry fitting for real-time pose estimation. Translation and rotation errors feed a fuzzy controller that adjusts robot commands to reach the expert-corrected pose.
Figure 3: Autonomous catheter navigation and constant-curvature modeling. (a) After training, the learned policy autonomously steers the catheter to the predefined target region in a silicone renal artery phantom by continuously localizing the tip in fluoroscopic frames. (b) Upon bifurcation detection, the catheter centerline (skeleton) is extracted and fitted with a constant-curvature model. Directional consistency is assessed by the sign/magnitude of the cross product between the fitted tangent and the skeleton tangent; the resulting geometric errors drive an expert-in-the-loop fuzzy controller that updates robot commands to achieve the expert-specified pose. Insets report the tip–target distance (pixels) and cross-product values over time.
Figure 4: Comparison of algorithm performance metrics. (a) Normalized Comparison of Key Performance Indicators Across Different RL Algorithms. (b) Average reward per episode during SAC-EIL-GAIL training. As the number of episodes increases, the reward gradually converges. This indicates stable and effective learning.
Figure 5: This figure illustrates the error distributions of TD3, SAC, SAC-GAIL, SAC-EIL, and SAC-EIL-GAIL. The boxplots highlight the central tendency and variability of each method, while the red asterisks indicate statistically significant differences with SAC-EIL-GAIL ($*p < 0.05, **p < 0.01, ***p < 0.001$).

Sample-Efficient Learning with Online Expert Correction for Autonomous Catheter Steering in Endovascular Bifurcation Navigation

TL;DR

Abstract

Sample-Efficient Learning with Online Expert Correction for Autonomous Catheter Steering in Endovascular Bifurcation Navigation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)