Table of Contents
Fetching ...

MoSS: Monocular Shape Sensing for Continuum Robots

Chengnan Shentu, Enxu Li, Chaojun Chen, Puspita Triana Dewi, David B. Lindell, Jessica Burgner-Kahrs

TL;DR

MoSSNet tackles the challenge of real-time 3D shape sensing for continuum robots using a single RGB camera. It introduces an encoder–decoder network with three decoders (centerline, arclength, and importance) and a weighted curve-fitting step to recover a 3D centerline from monocular imagery, achieving 0.91 mm mean shape error at 70 Hz on real data. The method is validated on both real hardware and simulation, showing strong sim-to-real transfer and robustness to varying camera configurations, without fiducial markers or camera calibration. The work also provides large MoSS-Real and MoSS-Sim datasets and demonstrates significant potential for low-cost, real-time continuum-robot sensing in medical and industrial applications.

Abstract

Continuum robots are promising candidates for interactive tasks in medical and industrial applications due to their unique shape, compliance, and miniaturization capability. Accurate and real-time shape sensing is essential for such tasks yet remains a challenge. Embedded shape sensing has high hardware complexity and cost, while vision-based methods require stereo setup and struggle to achieve real-time performance. This paper proposes the first eye-to-hand monocular approach to continuum robot shape sensing. Utilizing a deep encoder-decoder network, our method, MoSSNet, eliminates the computation cost of stereo matching and reduces requirements on sensing hardware. In particular, MoSSNet comprises an encoder and three parallel decoders to uncover spatial, length, and contour information from a single RGB image, and then obtains the 3D shape through curve fitting. A two-segment tendon-driven continuum robot is used for data collection and testing, demonstrating accurate (mean shape error of 0.91 mm, or 0.36% of robot length) and real-time (70 fps) shape sensing on real-world data. Additionally, the method is optimized end-to-end and does not require fiducial markers, manual segmentation, or camera calibration. Code and datasets will be made available at https://github.com/ContinuumRoboticsLab/MoSSNet.

MoSS: Monocular Shape Sensing for Continuum Robots

TL;DR

MoSSNet tackles the challenge of real-time 3D shape sensing for continuum robots using a single RGB camera. It introduces an encoder–decoder network with three decoders (centerline, arclength, and importance) and a weighted curve-fitting step to recover a 3D centerline from monocular imagery, achieving 0.91 mm mean shape error at 70 Hz on real data. The method is validated on both real hardware and simulation, showing strong sim-to-real transfer and robustness to varying camera configurations, without fiducial markers or camera calibration. The work also provides large MoSS-Real and MoSS-Sim datasets and demonstrates significant potential for low-cost, real-time continuum-robot sensing in medical and industrial applications.

Abstract

Continuum robots are promising candidates for interactive tasks in medical and industrial applications due to their unique shape, compliance, and miniaturization capability. Accurate and real-time shape sensing is essential for such tasks yet remains a challenge. Embedded shape sensing has high hardware complexity and cost, while vision-based methods require stereo setup and struggle to achieve real-time performance. This paper proposes the first eye-to-hand monocular approach to continuum robot shape sensing. Utilizing a deep encoder-decoder network, our method, MoSSNet, eliminates the computation cost of stereo matching and reduces requirements on sensing hardware. In particular, MoSSNet comprises an encoder and three parallel decoders to uncover spatial, length, and contour information from a single RGB image, and then obtains the 3D shape through curve fitting. A two-segment tendon-driven continuum robot is used for data collection and testing, demonstrating accurate (mean shape error of 0.91 mm, or 0.36% of robot length) and real-time (70 fps) shape sensing on real-world data. Additionally, the method is optimized end-to-end and does not require fiducial markers, manual segmentation, or camera calibration. Code and datasets will be made available at https://github.com/ContinuumRoboticsLab/MoSSNet.
Paper Structure (26 sections, 6 equations, 6 figures, 5 tables)

This paper contains 26 sections, 6 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Our method, MoSSNet, takes a single camera image as input and outputs an accurate parametric representation of the robot centerline in real-time, without requiring fiducial markers, manual segmentation or camera calibration.
  • Figure 2: Overview of our approach, MoSSNet. The network takes as input the captured image of the robot and generates importance for reconstruction, centerline coordinates, and relative arclength. These flattened outputs are then processed by the weighted curve fitting algorithm to generate a curve that parameterizes the robot's centerline. To train the network, we supervise its learning process by penalizing the mean squared error between the predicted and ground truth curves.
  • Figure 3: We collect a monocular shape sensing dataset with a two-segment tendon-driven continuum robot on hardware and in simulation (a) overview of hardware setup; (b) sample captured image; (c) sample simulated image.
  • Figure 4: MoSSNet's three decoders output interpretable pixel-wise information for 3D curve fitting despite not having pixel-wise supervision. The three encoder outputs, from left to right, provide spatial, length, and contour information about the robot to obtain accurate shape via weighted curve fitting.
  • Figure 5: Influence of the amount of real training data and pre-training with simulated data on MERS.
  • ...and 1 more figures