Table of Contents
Fetching ...

AFT: Appearance-Based Feature Tracking for Markerless and Training-Free Shape Reconstruction of Soft Robots

Shangyuan Yuan, Preston Fairchild, Yu Mei, Xinyu Zhou, Xiaobo Tan

TL;DR

The paper tackles real-time shape sensing for soft robots without markers or task-specific training by introducing Appearance-based Feature Tracking (AFT). It combines a static reference construction stage—building a feature-enriched geometric/kinematic model from multi-view data—with online, markerless reconstruction that matches live RGB-D observations to this reference through a hierarchical optimization that decouples local and global deformations. The approach achieves 2.6% average tip error at 2.5 Hz on a continuum robot and demonstrates robustness to occlusion, background variation, and viewpoint changes, including closed-loop control tasks. This work offers a practical, low-cost pathway for deploying soft robots in unstructured environments, with potential applicability beyond the tested configuration to other soft-robot platforms.

Abstract

Accurate shape reconstruction is essential for precise control and reliable operation of soft robots. Compared to sensor-based approaches, vision-based methods offer advantages in cost, simplicity, and ease of deployment. However, existing vision-based methods often rely on complex camera setups, specific backgrounds, or large-scale training datasets, limiting their practicality in real-world scenarios. In this work, we propose a vision-based, markerless, and training-free framework for soft robot shape reconstruction that directly leverages the robot's natural surface appearance. These surface features act as implicit visual markers, enabling a hierarchical matching strategy that decouples local partition alignment from global kinematic optimization. Requiring only an initial 3D reconstruction and kinematic alignment, our method achieves real-time shape tracking across diverse environments while maintaining robustness to occlusions and variations in camera viewpoints. Experimental validation on a continuum soft robot demonstrates an average tip error of 2.6% during real-time operation, as well as stable performance in practical closed-loop control tasks. These results highlight the potential of the proposed approach for reliable, low-cost deployment in dynamic real-world settings.

AFT: Appearance-Based Feature Tracking for Markerless and Training-Free Shape Reconstruction of Soft Robots

TL;DR

The paper tackles real-time shape sensing for soft robots without markers or task-specific training by introducing Appearance-based Feature Tracking (AFT). It combines a static reference construction stage—building a feature-enriched geometric/kinematic model from multi-view data—with online, markerless reconstruction that matches live RGB-D observations to this reference through a hierarchical optimization that decouples local and global deformations. The approach achieves 2.6% average tip error at 2.5 Hz on a continuum robot and demonstrates robustness to occlusion, background variation, and viewpoint changes, including closed-loop control tasks. This work offers a practical, low-cost pathway for deploying soft robots in unstructured environments, with potential applicability beyond the tested configuration to other soft-robot platforms.

Abstract

Accurate shape reconstruction is essential for precise control and reliable operation of soft robots. Compared to sensor-based approaches, vision-based methods offer advantages in cost, simplicity, and ease of deployment. However, existing vision-based methods often rely on complex camera setups, specific backgrounds, or large-scale training datasets, limiting their practicality in real-world scenarios. In this work, we propose a vision-based, markerless, and training-free framework for soft robot shape reconstruction that directly leverages the robot's natural surface appearance. These surface features act as implicit visual markers, enabling a hierarchical matching strategy that decouples local partition alignment from global kinematic optimization. Requiring only an initial 3D reconstruction and kinematic alignment, our method achieves real-time shape tracking across diverse environments while maintaining robustness to occlusions and variations in camera viewpoints. Experimental validation on a continuum soft robot demonstrates an average tip error of 2.6% during real-time operation, as well as stable performance in practical closed-loop control tasks. These results highlight the potential of the proposed approach for reliable, low-cost deployment in dynamic real-world settings.

Paper Structure

This paper contains 24 sections, 12 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Overview of the proposed framework. The framework is divided into two stages: (a) Static Reference Construction, where a reference model with multi-scale features and backbone kinematics is initialized from multi-view images; and (b) Online Shape Reconstruction, where incoming frames are segmented, matched, and used to update the reference model for continuous 3D shape reconstruction.
  • Figure 2: Illustration of multi-scale feature extraction. Features are extracted from Res2, Res3, and Res4 stages of a ResNet-50, providing increasing receptive fields and feature dimensions for hierarchical representation.
  • Figure 3: Illustration of experimental setups used to evaluate robustness under occlusion and varying viewpoints. (a) shows the overall experimental setup, including an RGB-D camera, a soft continuum robot, and a motion capture system. (b)--(d) depict occlusion settings with horizontal bars digitally introduced at different (position, width) values, where the first number denotes the normalized height along the image and the second denotes the relative width (0 indicates no occlusion). (e)--(g) show images captured from different viewpoints: front-right, front-left, and side-left.
  • Figure 4: Performance under different occlusion settings. (a) Relative tip error under varying block positions. (b) Relative tip error under varying block widths.
  • Figure 5: Experimental demonstration of viewpoint robustness. The same target shape is reconstructed from three viewpoints (blue, orange, green) and visualized in a common frame (via calibrated extrinsics), showing close overlap.
  • ...and 3 more figures