Table of Contents
Fetching ...

Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning

Harry Robertshaw, Lennart Karstensen, Benjamin Jackson, Alejandro Granados, Thomas C. Booth

TL;DR

This work tackles autonomous navigation of catheters and guidewires during mechanical thrombectomy by learning reward signals from expert demonstrations via inverse reinforcement learning in a SOFA-based vascular simulator. A soft actor-critic controller, equipped with an LSTM observation embedder, is trained under three reward schemes: a dense reward, an IRL-derived reward, and a reward-shaping combination of the two. Reward shaping yields the best overall performance, achieving 100% success and the fastest navigation times, while dual-device tracking reproduces expert-like strategies by using the catheter early for stabilization. The findings demonstrate the potential of IRL-informed reward shaping to enhance autonomous endovascular navigation, with future work needed to generalize across varied anatomies and extend to full MT procedures in vitro and clinically.

Abstract

Purpose: Autonomous navigation of catheters and guidewires can enhance endovascular surgery safety and efficacy, reducing procedure times and operator radiation exposure. Integrating tele-operated robotics could widen access to time-sensitive emergency procedures like mechanical thrombectomy (MT). Reinforcement learning (RL) shows potential in endovascular navigation, yet its application encounters challenges without a reward signal. This study explores the viability of autonomous navigation in MT vasculature using inverse RL (IRL) to leverage expert demonstrations. Methods: This study established a simulation-based training and evaluation environment for MT navigation. We used IRL to infer reward functions from expert behaviour when navigating a guidewire and catheter. We utilized soft actor-critic to train models with various reward functions and compared their performance in silico. Results: We demonstrated feasibility of navigation using IRL. When evaluating single versus dual device (i.e. guidewire versus catheter and guidewire) tracking, both methods achieved high success rates of 95% and 96%, respectively. Dual-tracking, however, utilized both devices mimicking an expert. A success rate of 100% and procedure time of 22.6 s were obtained when training with a reward function obtained through reward shaping. This outperformed a dense reward function (96%, 24.9 s) and an IRL-derived reward function (48%, 59.2 s). Conclusions: We have contributed to the advancement of autonomous endovascular intervention navigation, particularly MT, by employing IRL. The results underscore the potential of using reward shaping to train models, offering a promising avenue for enhancing the accessibility and precision of MT. We envisage that future research can extend our methodology to diverse anatomical structures to enhance generalizability.

Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning

TL;DR

This work tackles autonomous navigation of catheters and guidewires during mechanical thrombectomy by learning reward signals from expert demonstrations via inverse reinforcement learning in a SOFA-based vascular simulator. A soft actor-critic controller, equipped with an LSTM observation embedder, is trained under three reward schemes: a dense reward, an IRL-derived reward, and a reward-shaping combination of the two. Reward shaping yields the best overall performance, achieving 100% success and the fastest navigation times, while dual-device tracking reproduces expert-like strategies by using the catheter early for stabilization. The findings demonstrate the potential of IRL-informed reward shaping to enhance autonomous endovascular navigation, with future work needed to generalize across varied anatomies and extend to full MT procedures in vitro and clinically.

Abstract

Purpose: Autonomous navigation of catheters and guidewires can enhance endovascular surgery safety and efficacy, reducing procedure times and operator radiation exposure. Integrating tele-operated robotics could widen access to time-sensitive emergency procedures like mechanical thrombectomy (MT). Reinforcement learning (RL) shows potential in endovascular navigation, yet its application encounters challenges without a reward signal. This study explores the viability of autonomous navigation in MT vasculature using inverse RL (IRL) to leverage expert demonstrations. Methods: This study established a simulation-based training and evaluation environment for MT navigation. We used IRL to infer reward functions from expert behaviour when navigating a guidewire and catheter. We utilized soft actor-critic to train models with various reward functions and compared their performance in silico. Results: We demonstrated feasibility of navigation using IRL. When evaluating single versus dual device (i.e. guidewire versus catheter and guidewire) tracking, both methods achieved high success rates of 95% and 96%, respectively. Dual-tracking, however, utilized both devices mimicking an expert. A success rate of 100% and procedure time of 22.6 s were obtained when training with a reward function obtained through reward shaping. This outperformed a dense reward function (96%, 24.9 s) and an IRL-derived reward function (48%, 59.2 s). Conclusions: We have contributed to the advancement of autonomous endovascular intervention navigation, particularly MT, by employing IRL. The results underscore the potential of using reward shaping to train models, offering a promising avenue for enhancing the accessibility and precision of MT. We envisage that future research can extend our methodology to diverse anatomical structures to enhance generalizability.
Paper Structure (15 sections, 6 equations, 4 figures, 2 tables)

This paper contains 15 sections, 6 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: MT environment a used in simulations, with anatomy labelled and all possible targets, b with insertion point and example navigation path to target in each common carotid artery.
  • Figure 2: a Success rate (%), b Path ratio (%) during training for single vs dual device tracking.
  • Figure 3: Trajectories of catheter and guidewire tip for a demonstrator data, b single device tracking (with final catheter position highlighted), c dual device tracking (with final catheter position highlighted), and d IRL.
  • Figure 4: a Success rate, b Path ratio during training for a dense reward function, IRL-derived reward function, and reward shaping function.