Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion

Huanyu Tian; Martin Huber; Christopher E. Mower; Zhe Han; Changsheng Li; Xingguang Duan; Christos Bergeles

Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion

Huanyu Tian, Martin Huber, Christopher E. Mower, Zhe Han, Changsheng Li, Xingguang Duan, Christos Bergeles

TL;DR

The paper tackles safe, semi-autonomous docking in laparoscopic procedures by fusing occlusion-robust pose estimation with a learned hand-eye information fusion framework. It introduces a KalmanNet-based updater within an error-state Kalman filter, trained on a self-supervised dataset built from marker-based ground truth, and couples this with an optimization-based co-manipulation controller that enforces translational and rotational compliance. Empirical results in phantom tests show substantial improvements in docking precision and interaction safety, with position dispersion reduced to 1.23±0.81 mm and force dispersion to 0.78±0.57 N, and docking success rising to 100% in the test group. The approach demonstrates real-time performance and potential applicability beyond laparoscopic docking to other minimally invasive procedures, while future work aims at markerless pose estimation and dynamic camera strategies.

Abstract

In this study, we introduce a novel shared-control system for key-hole docking operations, combining a commercial camera with occlusion-robust pose estimation and a hand-eye information fusion technique. This system is used to enhance docking precision and force-compliance safety. To train a hand-eye information fusion network model, we generated a self-supervised dataset using this docking system. After training, our pose estimation method showed improved accuracy compared to traditional methods, including observation-only approaches, hand-eye calibration, and conventional state estimation filters. In real-world phantom experiments, our approach demonstrated its effectiveness with reduced position dispersion (1.23\pm 0.81 mm vs. 2.47 \pm 1.22 mm) and force dispersion (0.78\pm 0.57 N vs. 1.15 \pm 0.97 N) compared to the control group. These advancements in semi-autonomy co-manipulation scenarios enhance interaction and stability. The study presents an anti-interference, steady, and precision solution with potential applications extending beyond laparoscopic surgery to other minimally invasive procedures.

Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion

TL;DR

Abstract

Paper Structure (25 sections, 21 equations, 7 figures, 1 table)

This paper contains 25 sections, 21 equations, 7 figures, 1 table.

Introduction
Related Works
Contributions
System Overview
Co-manipulation for docking
Vision system and marker design
State Estimation for Docking
Pose estimation
Interactive force estimation
Hand Eye Information Fusion
Training dataset
Hand-eye information fusion network
Hand-eye information fusion network loss functions
Semi-Autonomy Controller
Translational compliance criterion design
...and 10 more sections

Figures (7)

Figure 1: The docking system leverages RGB cameras, the robot, and the trocar with markers to conduct the laparoscopic instrument insertion procedure. The pose from the camera to the target (trocar in our case) is $T_{t}^{c}$. To register the robot, the pose from the camera to the end-effector $T_{e}^{c}$ can be observed to infer the hand-eye calibration results i.e. $T_{c}^{w}$. However, in this context, the camera's movement, the occlusion of the two markers, and key-point detection failures could cause measurements' outliers. Note, the dashed line represents a transformation that is initially unknown without calibration.
Figure 2: Markers are attached on both the robot (end-effector) and the trocar. To reduce the chance of catastrophic occlusions during co-manipulation, a multi-marker ensemble is used. The coordinate frame of ArUco markers planar board is considered the world coordinate frame.
Figure 3: The update network for KalmanNet. The inputs of KalmanNet are modified for our ESKF structure. This method infers the covariance of measurements and system models. The output of the network is the Kalman Gain containing the two independent parts (position gains and orientation gains).
Figure 4: Illustration of disturbances in the dataset and results of the hand-eye-calibration-based robot's motion (Rob lines) which serves as ground truth (a) Part Occlusion of the marker of the trocar; (b) Full Occlusion of the marker of the trocar; (c) Occlusion of the marker of the end effector; (d) Camera real-time movement; (e) Example 3D plot of measured positions (Cam) and Rob explaining robustness of ground truth in the presence of a single outlier; (f) Example 3D plot of measured positions (Cam) and Rob explaining robustness of ground truth while frequent outliers exists.
Figure 5: Long-time series fusion experiment (position component). The plots show the predicted positions along the x, y, z axes.
...and 2 more figures

Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion

TL;DR

Abstract

Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion

Authors

TL;DR

Abstract

Table of Contents

Figures (7)