Table of Contents
Fetching ...

Force-EvT: A Closer Look at Robotic Gripper Force Measurement with Event-based Vision Transformer

Qianyu Guo, Ziqing Yu, Jiaming Fu, Yawen Lu, Yahya Zweiri, Dongming Gan

TL;DR

The paper addresses robust force sensing for soft robotic grippers under challenging visual conditions by leveraging event-based vision and Vision Transformer regression. It introduces Force-EvT, which converts asynchronous event streams into time-framed inputs and uses a ViT backbone to predict contact forces, supported by the RG-Event dataset of 1000 labeled event frames. The approach achieves a RMSE of $0.13\ \mathrm{N}$ and $R^2=0.93$, outperforming a prior marker-based method by about $13.0\%$, and demonstrates potential for real-time, sensorless force estimation in low-light and high-speed scenarios. This work advances manipulation reliability for soft grippers and opens avenues for more robust force control in dynamic environments.

Abstract

Robotic grippers are receiving increasing attention in various industries as essential components of robots for interacting and manipulating objects. While significant progress has been made in the past, conventional rigid grippers still have limitations in handling irregular objects and can damage fragile objects. We have shown that soft grippers offer deformability to adapt to a variety of object shapes and maximize object protection. At the same time, dynamic vision sensors (e.g., event-based cameras) are capable of capturing small changes in brightness and streaming them asynchronously as events, unlike RGB cameras, which do not perform well in low-light and fast-moving environments. In this paper, a dynamic-vision-based algorithm is proposed to measure the force applied to the gripper. In particular, we first set up a DVXplorer Lite series event camera to capture twenty-five sets of event data. Second, motivated by the impressive performance of the Vision Transformer (ViT) algorithm in dense image prediction tasks, we propose a new approach that demonstrates the potential for real-time force estimation and meets the requirements of real-world scenarios. We extensively evaluate the proposed algorithm on a wide range of scenarios and settings, and show that it consistently outperforms recent approaches.

Force-EvT: A Closer Look at Robotic Gripper Force Measurement with Event-based Vision Transformer

TL;DR

The paper addresses robust force sensing for soft robotic grippers under challenging visual conditions by leveraging event-based vision and Vision Transformer regression. It introduces Force-EvT, which converts asynchronous event streams into time-framed inputs and uses a ViT backbone to predict contact forces, supported by the RG-Event dataset of 1000 labeled event frames. The approach achieves a RMSE of and , outperforming a prior marker-based method by about , and demonstrates potential for real-time, sensorless force estimation in low-light and high-speed scenarios. This work advances manipulation reliability for soft grippers and opens avenues for more robust force control in dynamic environments.

Abstract

Robotic grippers are receiving increasing attention in various industries as essential components of robots for interacting and manipulating objects. While significant progress has been made in the past, conventional rigid grippers still have limitations in handling irregular objects and can damage fragile objects. We have shown that soft grippers offer deformability to adapt to a variety of object shapes and maximize object protection. At the same time, dynamic vision sensors (e.g., event-based cameras) are capable of capturing small changes in brightness and streaming them asynchronously as events, unlike RGB cameras, which do not perform well in low-light and fast-moving environments. In this paper, a dynamic-vision-based algorithm is proposed to measure the force applied to the gripper. In particular, we first set up a DVXplorer Lite series event camera to capture twenty-five sets of event data. Second, motivated by the impressive performance of the Vision Transformer (ViT) algorithm in dense image prediction tasks, we propose a new approach that demonstrates the potential for real-time force estimation and meets the requirements of real-world scenarios. We extensively evaluate the proposed algorithm on a wide range of scenarios and settings, and show that it consistently outperforms recent approaches.
Paper Structure (14 sections, 1 equation, 5 figures, 1 table)

This paper contains 14 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: Comparison of frame-based RGB Camera and event camera outputs in experimental scenarios, reveals significant distinctions in how these two types of cameras capture and process visual information.
  • Figure 2: Overview of the proposed method Force-EvT. The output of the event camera is converted into event frames over a certain time interval $T$. Then, the event frames are processed by Vision Transformer (ViT) network, which makes force prediction of the robotic gripper.
  • Figure 3: Experimental setup to conduct data collection using an event camera, a force sensor, and robotic grippers.
  • Figure 4: The gripper is captured by an RGB camera and an event camera. (a) displays our designed soft robotic gripper captured by a standard RGB camera in a state without any applied force, (b) and (c) show the gripper under different deformation states, as captured by an event camera.
  • Figure 5: The prediction results demonstrate the efficacy and accuracy of employing Force-EvT in force measurement task.