Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

Min Tan; Yushun Tao; Boyun Zheng; GaoSheng Xie; Lijuan Feng; Zeyang Xia; Jing Xiong

Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

Min Tan, Yushun Tao, Boyun Zheng, GaoSheng Xie, Lijuan Feng, Zeyang Xia, Jing Xiong

TL;DR

This work tackles safe autonomous navigation in robotic digestive endoscopy by introducing HI-PPO, a PPO-based framework augmented with Human Intervention through Enhanced Exploration Mechanisms, Reward-Penalty adjustments, and Behavior Cloning Similarity. The method models distal bending and continuous steering within a hybrid action space, using depth-based target estimation and a depth-informed reward to drive progress while preserving tissue safety. Experimental validation in Unity simulations across multiple colon anatomies demonstrates that HI-PPO achieves mean $ATE$ around $8.02$ mm and $S$ near $0.862$, outperforming standard RL baselines and approaching human expert performance, with ablations confirming the value of each HI component. The results suggest HI-PPO can significantly improve safety, efficiency, and reliability of endoscopic navigation, offering a practical path toward clinical translation and safer automated procedures.

Abstract

With the increasing application of automated robotic digestive endoscopy (RDE), ensuring safe and efficient navigation in the unstructured and narrow digestive tract has become a critical challenge. Existing automated reinforcement learning navigation algorithms often result in potentially risky collisions due to the absence of essential human intervention, which significantly limits the safety and effectiveness of RDE in actual clinical practice. To address this limitation, we proposed a Human Intervention (HI)-based Proximal Policy Optimization (PPO) framework, dubbed HI-PPO, which incorporates expert knowledge to enhance RDE's safety. Specifically, HI-PPO combines Enhanced Exploration Mechanism (EEM), Reward-Penalty Adjustment (RPA), and Behavior Cloning Similarity (BCS) to address PPO's exploration inefficiencies for safe navigation in complex gastrointestinal environments. Comparative experiments were conducted on a simulation platform, and the results showed that HI-PPO achieved a mean ATE (Average Trajectory Error) of $8.02\ \text{mm}$ and a Security Score of $0.862$, demonstrating performance comparable to human experts. The code will be publicly available once this paper is published.

Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

TL;DR

around

mm and

near

, outperforming standard RL baselines and approaching human expert performance, with ablations confirming the value of each HI component. The results suggest HI-PPO can significantly improve safety, efficiency, and reliability of endoscopic navigation, offering a practical path toward clinical translation and safer automated procedures.

Abstract

and a Security Score of

, demonstrating performance comparable to human experts. The code will be publicly available once this paper is published.

Paper Structure (21 sections, 11 equations, 8 figures, 4 tables, 1 algorithm)

This paper contains 21 sections, 11 equations, 8 figures, 4 tables, 1 algorithm.

Introduction
Related work
RL-Based Autonomous Navigation for Endoscopes
Human Intelligence-Integrated Reinforcement Learning
Methodology
DRL Problem Formulation for RDE Agent
Human Intervention Techniques
H-PPO Framework Integration
Experimental Verification
Experimental Platform
Human Guidance implementation
Evaluation Metrics
Implementation Details
Results
Impact of Human Intervention on Training
...and 6 more sections

Figures (8)

Figure 1: Overview framework of the proposed HI-PPO.
Figure 2: Detection of endoscopic navigation target point using depth estimation and connected component centroid analysis.
Figure 3: The colon model used in the experimental phase. (a) Simple colon No.1 for training (b) Complex colon No.2 for training (c) Complex colon No.3 for inference.
Figure 4: Experiment configuration and environmental visualization based on different anatomical segments.
Figure 5: The learning curve of HI-PPO training on colons of different complexity. Two colon models were used. Cumulative rewards are normalized in the range [-1, 1]. The shaded area represents the range of values obtained over 5 training sessions.
...and 3 more figures

Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

TL;DR

Abstract

Safe Navigation for Robotic Digestive Endoscopy via Human Intervention-based Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)