A three-dimensional force estimation method for the cable-driven soft robot based on monocular images
Xiaohan Zhu, Ran Bu, Zhen Li, Fan Xu, Hesheng Wang
TL;DR
Real-time estimation of 3D interaction forces for cable-driven soft robots using monocular vision is challenging with traditional sensors or 2D proxies. The authors present an end-to-end network that fuses monocular RGB images with PWM actuation signals through a 2D-3D feature fusion module, a unified feature representation, and an LSTM-based time-series decoder. Key contributions include bridging the image–force dimensional gap via depth estimation and segmentation, learning PWM-conditioned feature representations with cross-attention, and leveraging temporal context to mitigate hysteresis. The approach achieves marker-free, real-time 3D tip-force estimation on a four-cable soft robot, enabling safer and more capable interactive manipulation in practical tasks.
Abstract
Soft manipulators are known for their superiority in coping with high-safety-demanding interaction tasks, e.g., robot-assisted surgeries, elderly caring, etc. Yet the challenges residing in real-time contact feedback have hindered further applications in precise manipulation. This paper proposes an end-to-end network to estimate the 3D contact force of the soft robot, with the aim of enhancing its capabilities in interactive tasks. The presented method features directly utilizing monocular images fused with multidimensional actuation information as the network inputs. This approach simplifies the preprocessing of raw data compared to related studies that utilize 3D shape information for network inputs, consequently reducing configuration reconstruction errors. The unified feature representation module is devised to elevate low-dimensional features from the system's actuation signals to the same level as image features, facilitating smoother integration of multimodal information. The proposed method has been experimentally validated in the soft robot testbed, achieving satisfying accuracy in 3D force estimation (with a mean relative error of 0.84% compared to the best-reported result of 2.2% in the related works).
