Force-EvT: A Closer Look at Robotic Gripper Force Measurement with Event-based Vision Transformer
Qianyu Guo, Ziqing Yu, Jiaming Fu, Yawen Lu, Yahya Zweiri, Dongming Gan
TL;DR
The paper addresses robust force sensing for soft robotic grippers under challenging visual conditions by leveraging event-based vision and Vision Transformer regression. It introduces Force-EvT, which converts asynchronous event streams into time-framed inputs and uses a ViT backbone to predict contact forces, supported by the RG-Event dataset of 1000 labeled event frames. The approach achieves a RMSE of $0.13\ \mathrm{N}$ and $R^2=0.93$, outperforming a prior marker-based method by about $13.0\%$, and demonstrates potential for real-time, sensorless force estimation in low-light and high-speed scenarios. This work advances manipulation reliability for soft grippers and opens avenues for more robust force control in dynamic environments.
Abstract
Robotic grippers are receiving increasing attention in various industries as essential components of robots for interacting and manipulating objects. While significant progress has been made in the past, conventional rigid grippers still have limitations in handling irregular objects and can damage fragile objects. We have shown that soft grippers offer deformability to adapt to a variety of object shapes and maximize object protection. At the same time, dynamic vision sensors (e.g., event-based cameras) are capable of capturing small changes in brightness and streaming them asynchronously as events, unlike RGB cameras, which do not perform well in low-light and fast-moving environments. In this paper, a dynamic-vision-based algorithm is proposed to measure the force applied to the gripper. In particular, we first set up a DVXplorer Lite series event camera to capture twenty-five sets of event data. Second, motivated by the impressive performance of the Vision Transformer (ViT) algorithm in dense image prediction tasks, we propose a new approach that demonstrates the potential for real-time force estimation and meets the requirements of real-world scenarios. We extensively evaluate the proposed algorithm on a wide range of scenarios and settings, and show that it consistently outperforms recent approaches.
