Table of Contents
Fetching ...

Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments

Zekai Liang, Zih-Yun Chiu, Florian Richter, Michael C. Yip

TL;DR

This work tackles robust, markerless initialization of the camera-to-robot transform for cable-driven surgical robots where joint-angle readings are unreliable. It introduces a differentiable rendering framework that models the instrument with cylinders and operates in a four-DoF pose hypothesis space using a LookAt-based sampling strategy to perform a one-shot calibration. A composite objective, combining a silhouette rendering loss $\mathcal{L}_{render}$ and a geometry loss $\mathcal{L}_{geo}$, guides gradient-based optimization to rapidly converge to accurate pose estimates, even from partial visual information. Real-world experiments on the dVRK show superior calibration consistency over PnP baselines and effective open-loop manipulation, highlighting the approach’s practicality for improving tool tracking in robot-assisted surgery.

Abstract

Robot pose estimation is a challenging and crucial task for vision-based surgical robotic automation. Typical robotic calibration approaches, however, are not applicable to surgical robots, such as the da Vinci Research Kit (dVRK), due to joint angle measurement errors from cable-drives and the partially visible kinematic chain. Hence, previous works in surgical robotic automation used tracking algorithms to estimate the pose of the surgical tool in real-time and compensate for the joint angle errors. However, a big limitation of these previous tracking works is the initialization step which relied on only keypoints and SolvePnP. In this work, we fully explore the potential of geometric primitives beyond just keypoints with differentiable rendering, cylinders, and construct a versatile pose matching pipeline in a novel pose hypothesis space. We demonstrate the state-of-the-art performance of our single-shot calibration method with both calibration consistency and real surgical tasks. As a result, this marker-less calibration approach proves to be a robust and generalizable initialization step for surgical tool tracking.

Differentiable Rendering-based Pose Estimation for Surgical Robotic Instruments

TL;DR

This work tackles robust, markerless initialization of the camera-to-robot transform for cable-driven surgical robots where joint-angle readings are unreliable. It introduces a differentiable rendering framework that models the instrument with cylinders and operates in a four-DoF pose hypothesis space using a LookAt-based sampling strategy to perform a one-shot calibration. A composite objective, combining a silhouette rendering loss and a geometry loss , guides gradient-based optimization to rapidly converge to accurate pose estimates, even from partial visual information. Real-world experiments on the dVRK show superior calibration consistency over PnP baselines and effective open-loop manipulation, highlighting the approach’s practicality for improving tool tracking in robot-assisted surgery.

Abstract

Robot pose estimation is a challenging and crucial task for vision-based surgical robotic automation. Typical robotic calibration approaches, however, are not applicable to surgical robots, such as the da Vinci Research Kit (dVRK), due to joint angle measurement errors from cable-drives and the partially visible kinematic chain. Hence, previous works in surgical robotic automation used tracking algorithms to estimate the pose of the surgical tool in real-time and compensate for the joint angle errors. However, a big limitation of these previous tracking works is the initialization step which relied on only keypoints and SolvePnP. In this work, we fully explore the potential of geometric primitives beyond just keypoints with differentiable rendering, cylinders, and construct a versatile pose matching pipeline in a novel pose hypothesis space. We demonstrate the state-of-the-art performance of our single-shot calibration method with both calibration consistency and real surgical tasks. As a result, this marker-less calibration approach proves to be a robust and generalizable initialization step for surgical tool tracking.

Paper Structure

This paper contains 13 sections, 24 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: The motion of the da Vinci surgical robot is constrained by the Remote Center of Motion (RCM). In the endoscopic view, typically only the gripper and insertion shaft are visible. By fully leveraging these visual features, we construct a differentiable rendering-based framework to estimate globally optimal instrument pose.
  • Figure 2: Our calibration pipeline. We use low resolution image frames, masks and inaccurate joint angles as input. We generate a batch of pose candidates from our pose hypothesis space. Coarse batch optimization is conducted to select the candidate with best converging performance. We further refine the initial pose and output the final estimation with calibrated joint angles.
  • Figure 3: We parametrize the initial pose candidates into 4 independent Degree of Freedoms, $[\alpha, \beta, \gamma, d]$, covering all potential orientations while leaving out the redundant ones.
  • Figure 4: Our pose hypothesis space focuses on efficiently sampling all the potential feasible poses in real dVRK manipulation scenes while leaving out redundant ones, which greatly boosts the calibration speed and accuracy with limited sample quantity.
  • Figure 5: Quantitative results of our methods applied to real-world images of a dVRK's surgical manipulator. The best candidate of convergence is colored in blue, and the reference mask is colored in orange in the last two columns. We visualize how well the masks and the insertion shaft's edge lines align between the reference and estimated masks. The improved alignment from the third to the fourth columns demonstrates that our method leads to accurate estimation given the visual information.
  • ...and 4 more figures