Autonomous navigation of catheters and guidewires in mechanical thrombectomy using inverse reinforcement learning
Harry Robertshaw, Lennart Karstensen, Benjamin Jackson, Alejandro Granados, Thomas C. Booth
TL;DR
This work tackles autonomous navigation of catheters and guidewires during mechanical thrombectomy by learning reward signals from expert demonstrations via inverse reinforcement learning in a SOFA-based vascular simulator. A soft actor-critic controller, equipped with an LSTM observation embedder, is trained under three reward schemes: a dense reward, an IRL-derived reward, and a reward-shaping combination of the two. Reward shaping yields the best overall performance, achieving 100% success and the fastest navigation times, while dual-device tracking reproduces expert-like strategies by using the catheter early for stabilization. The findings demonstrate the potential of IRL-informed reward shaping to enhance autonomous endovascular navigation, with future work needed to generalize across varied anatomies and extend to full MT procedures in vitro and clinically.
Abstract
Purpose: Autonomous navigation of catheters and guidewires can enhance endovascular surgery safety and efficacy, reducing procedure times and operator radiation exposure. Integrating tele-operated robotics could widen access to time-sensitive emergency procedures like mechanical thrombectomy (MT). Reinforcement learning (RL) shows potential in endovascular navigation, yet its application encounters challenges without a reward signal. This study explores the viability of autonomous navigation in MT vasculature using inverse RL (IRL) to leverage expert demonstrations. Methods: This study established a simulation-based training and evaluation environment for MT navigation. We used IRL to infer reward functions from expert behaviour when navigating a guidewire and catheter. We utilized soft actor-critic to train models with various reward functions and compared their performance in silico. Results: We demonstrated feasibility of navigation using IRL. When evaluating single versus dual device (i.e. guidewire versus catheter and guidewire) tracking, both methods achieved high success rates of 95% and 96%, respectively. Dual-tracking, however, utilized both devices mimicking an expert. A success rate of 100% and procedure time of 22.6 s were obtained when training with a reward function obtained through reward shaping. This outperformed a dense reward function (96%, 24.9 s) and an IRL-derived reward function (48%, 59.2 s). Conclusions: We have contributed to the advancement of autonomous endovascular intervention navigation, particularly MT, by employing IRL. The results underscore the potential of using reward shaping to train models, offering a promising avenue for enhancing the accessibility and precision of MT. We envisage that future research can extend our methodology to diverse anatomical structures to enhance generalizability.
