SurgIRL: Towards Life-Long Learning for Surgical Automation by Incremental Reinforcement Learning

Yun-Jie Ho; Zih-Yun Chiu; Yuheng Zhi; Michael C. Yip

SurgIRL: Towards Life-Long Learning for Surgical Automation by Incremental Reinforcement Learning

Yun-Jie Ho, Zih-Yun Chiu, Yuheng Zhi, Michael C. Yip

TL;DR

This work train surgical automation policies through SurgIRL to accumulate and reuse learned knowledge and solve multiple surgical tasks sequentially, and develops incremental learning pipelines based on KIAN-ACE to accumulate and reuse learned knowledge and solve multiple surgical tasks sequentially.

Abstract

Surgical automation holds immense potential to improve the outcome and accessibility of surgery. Recent studies use reinforcement learning to learn policies that automate different surgical tasks. However, these policies are developed independently and are limited in their reusability when the task changes, making it more time-consuming when robots learn to solve multiple tasks. Inspired by how human surgeons build their expertise, we train surgical automation policies through Surgical Incremental Reinforcement Learning (SurgIRL). SurgIRL aims to (1) acquire new skills by referring to external policies (knowledge) and (2) accumulate and reuse these skills to solve multiple unseen tasks incrementally (incremental learning). Our SurgIRL framework includes three major components. We first define an expandable knowledge set containing heterogeneous policies that can be helpful for surgical tasks. Then, we propose Knowledge Inclusive Attention Network with mAximum Coverage Exploration (KIAN-ACE), which improves learning efficiency by maximizing the coverage of the knowledge set during the exploration process. Finally, we develop incremental learning pipelines based on KIAN-ACE to accumulate and reuse learned knowledge and solve multiple surgical tasks sequentially. Our simulation experiments show that KIAN-ACE efficiently learns to automate ten surgical tasks separately or incrementally. We also evaluate our learned policies on the da Vinci Research Kit (dVRK) and demonstrate successful sim-to-real transfers.

SurgIRL: Towards Life-Long Learning for Surgical Automation by Incremental Reinforcement Learning

TL;DR

Abstract

Paper Structure (14 sections, 7 equations, 7 figures, 2 tables)

This paper contains 14 sections, 7 equations, 7 figures, 2 tables.

Introduction
Related Work
Methods
Knowledge-Grounded Reinforcement Learning (KGRL)
Surgical Knowledge Set
Knowledge Inclusive Attention Network with mAximum Coverage Exploration (KIAN-ACE)
Incremental Learning for Surgical Tasks
Experiments and Results
Simulation Experiments
Experimental setup
Single-Task Learning
Incremental Learning
Real-Robot Experiments
Conclusion and Future work

Figures (7)

Figure 1: The dVRK incrementally learns to automate various surgical tasks, including endoscopic camera control and surgical manipulation. We propose a SurgIRL framework that enables surgical robots to learn multiple tasks by accumulating knowledge over tasks with the utmost flexibility. Each row in this figure is a sequence of tasks being incrementally learned. The diversity of the tasks demonstrates the flexibility and effectiveness of our framework.
Figure 2: Visualization of three external knowledge policies for surgical tasks. The first policy guides a surgical manipulator to approach an object or move to a point. The second policy moves an arm with an object in hand toward a target. The third policy involves two arms trying to hand over an object.
Figure 3: The policy architecture of KIAN-ACE chiu2024flexible. Given a state $\mathbf{s}_t$, the query $\Phi$ outputs a vector $\mathbf{u}_t$. Then, $\mathbf{u}_t$attends each knowledge key, $\mathbf{k}_{in}, \mathbf{k}_{g_1}, \dots$, to calculate the weight of each policy. These weights are used to perform knowledge sampling among all policies, $\pi_{in}, \pi_{g_1}, \dots$. Finally, an action is generated from the sampled policy, $\pi_e$.
Figure 4: SurgIRL's three incremental learning pipelines. The red blocks indicate the components reused from one task to another. Each pipeline has its best-fitted scenarios based on state/action spaces and similarity of tasks. Fig. \ref{['fig:reuse_keys']} suits tasks with different state/action spaces. Fig. \ref{['fig:reuse_query_keys']} suits tasks with the same state/action spaces but with different environmental dynamics. Fig. \ref{['fig:reuse_all']} suits tasks with greater similarity.
Figure 5: The performance of external policies, RL haarnoja2018soft, KIAN chiu2024flexible, and KIAN-ACE in single-task learning. KIAN-ACE outperforms other methods and has more consistent training results, demonstrating the effectiveness of its exploration strategy.
...and 2 more figures

SurgIRL: Towards Life-Long Learning for Surgical Automation by Incremental Reinforcement Learning

TL;DR

Abstract

SurgIRL: Towards Life-Long Learning for Surgical Automation by Incremental Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)