Table of Contents
Fetching ...

BronchoCopilot: Towards Autonomous Robotic Bronchoscopy via Multimodal Reinforcement Learning

Jianbo Zhao, Hao Chen, Qingyao Tian, Jian Chen, Bingyu Yang, Hongbin Liu

TL;DR

BronchoCopilot is proposed, a multimodal RL agent designed to acquire manipulation skills for autonomous bronchoscopy that attains a success rate of approximately 90% in fifth generation airways with consistent movements and demonstrates a robust capacity to adapt to diverse cases.

Abstract

Bronchoscopy plays a significant role in the early diagnosis and treatment of lung diseases. This process demands physicians to maneuver the flexible endoscope for reaching distal lesions, particularly requiring substantial expertise when examining the airways of the upper lung lobe. With the development of artificial intelligence and robotics, reinforcement learning (RL) method has been applied to the manipulation of interventional surgical robots. However, unlike human physicians who utilize multimodal information, most of the current RL methods rely on a single modality, limiting their performance. In this paper, we propose BronchoCopilot, a multimodal RL agent designed to acquire manipulation skills for autonomous bronchoscopy. BronchoCopilot specifically integrates images from the bronchoscope camera and estimated robot poses, aiming for a higher success rate within challenging airway environment. We employ auxiliary reconstruction tasks to compress multimodal data and utilize attention mechanisms to achieve an efficient latent representation of this data, serving as input for the RL module. This framework adopts a stepwise training and fine-tuning approach to mitigate the challenges of training difficulty. Our evaluation in the realistic simulation environment reveals that BronchoCopilot, by effectively harnessing multimodal information, attains a success rate of approximately 90\% in fifth generation airways with consistent movements. Additionally, it demonstrates a robust capacity to adapt to diverse cases.

BronchoCopilot: Towards Autonomous Robotic Bronchoscopy via Multimodal Reinforcement Learning

TL;DR

BronchoCopilot is proposed, a multimodal RL agent designed to acquire manipulation skills for autonomous bronchoscopy that attains a success rate of approximately 90% in fifth generation airways with consistent movements and demonstrates a robust capacity to adapt to diverse cases.

Abstract

Bronchoscopy plays a significant role in the early diagnosis and treatment of lung diseases. This process demands physicians to maneuver the flexible endoscope for reaching distal lesions, particularly requiring substantial expertise when examining the airways of the upper lung lobe. With the development of artificial intelligence and robotics, reinforcement learning (RL) method has been applied to the manipulation of interventional surgical robots. However, unlike human physicians who utilize multimodal information, most of the current RL methods rely on a single modality, limiting their performance. In this paper, we propose BronchoCopilot, a multimodal RL agent designed to acquire manipulation skills for autonomous bronchoscopy. BronchoCopilot specifically integrates images from the bronchoscope camera and estimated robot poses, aiming for a higher success rate within challenging airway environment. We employ auxiliary reconstruction tasks to compress multimodal data and utilize attention mechanisms to achieve an efficient latent representation of this data, serving as input for the RL module. This framework adopts a stepwise training and fine-tuning approach to mitigate the challenges of training difficulty. Our evaluation in the realistic simulation environment reveals that BronchoCopilot, by effectively harnessing multimodal information, attains a success rate of approximately 90\% in fifth generation airways with consistent movements. Additionally, it demonstrates a robust capacity to adapt to diverse cases.
Paper Structure (15 sections, 12 equations, 6 figures, 2 tables)

This paper contains 15 sections, 12 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: (a) Real surgical scenario: The operator is controlling the insertion of the robot bronchoscope. (b), (c) The simulation environment, includes the 3D airway model and the simulated dual-segment flexible endoscopic robot.
  • Figure 2: The establishment of the simulation environment. (a) The creation of the airway model, including segmentation from preoperative CT scans, bronchial tree extraction and rendering in the simulator, and we visualize the airway's generations based on tree branching. (b) The FEM modeling and the action space of the robot. The purple arrow $v_{r}$ denotes the orientation vector of the robot.
  • Figure 3: The Architecture of our method. The network takes data from three different modalities as input and outputs the manipulation policy. The entire architecture is trained in stages. In stage I, it encodes multimodal information to low-dimensional embeddings though reconstruction tasks. In stage II, it fuses multimodal embeddings into state representation as the input of stage III, with the loss from subsequent tasks used to fine-tune the parameters of the front stage's network.
  • Figure 4: RL learning curves for ablation and comparison experiments: (1) BronchoCopilot, (2) BronchoCopilot without visual data, (3) BronchoCopilot without proprioceptive data, (4) BronchoCopilot using concat for fusion, (5) BronchoCopilot using sum for fusion. All curves are smoothed by exponential smoothing with a factor of r=0.95.
  • Figure 5: The target positions in the fifth-level airways were selected for the upper left, upper right, and lower left lung lobes, respectively. The yellow, green, and pink lines represent the centerlines (reference paths), BronchoCopilot, and AEA robot tip trajectories. For visualization, all trajectories represent the average of three test runs.
  • ...and 1 more figures