Learning Stable Robot Grasping with Transformer-based Tactile Control Policies

En Yen Puang; Zechen Li; Chee Meng Chew; Shan Luo; Yan Wu

Learning Stable Robot Grasping with Transformer-based Tactile Control Policies

En Yen Puang, Zechen Li, Chee Meng Chew, Shan Luo, Yan Wu

TL;DR

This work reframes stable grasping by allowing the object’s center of gravity to shift during an episode and by introducing a gripping-force control dimension. It employs a model-free, end-to-end Transformer-based reinforcement learning policy that processes spatiotemporal tactile maps to output continuous changes in grasp location and grip force, trained with SAC. A multi-objective reward balances rotational stability and slippage in the early stages with a focus on minimizing excess grip force at terminal states, controlled by a trade-off parameter $\alpha$. The method achieves near-perfect success in simulation and exhibits zero-shot sim-to-real transfer on real hardware with varied load configurations, offering practical robustness and insights into the trade-offs between minimizing attempts and optimizing grip force. The results highlight Transformer architectures as advantageous for handling irregular temporal tactile data compared to CNN baselines, motivating future tactile-realistic control research.

Abstract

Measuring grasp stability is an important skill for dexterous robot manipulation tasks, which can be inferred from haptic information with a tactile sensor. Control policies have to detect rotational displacement and slippage from tactile feedback, and determine a re-grasp strategy in term of location and force. Classic stable grasp task only trains control policies to solve for re-grasp location with objects of fixed center of gravity. In this work, we propose a revamped version of stable grasp task that optimises both re-grasp location and gripping force for objects with unknown and moving center of gravity. We tackle this task with a model-free, end-to-end Transformer-based reinforcement learning framework. We show that our approach is able to solve both objectives after training in both simulation and in a real-world setup with zero-shot transfer. We also provide performance analysis of different models to understand the dynamics of optimizing two opposing objectives.

Learning Stable Robot Grasping with Transformer-based Tactile Control Policies

TL;DR

. The method achieves near-perfect success in simulation and exhibits zero-shot sim-to-real transfer on real hardware with varied load configurations, offering practical robustness and insights into the trade-offs between minimizing attempts and optimizing grip force. The results highlight Transformer architectures as advantageous for handling irregular temporal tactile data compared to CNN baselines, motivating future tactile-realistic control research.

Abstract

Paper Structure (17 sections, 1 equation, 9 figures, 2 tables)

This paper contains 17 sections, 1 equation, 9 figures, 2 tables.

Introduction
Related Work
Handling rotational slip
Solving stable grasp task
Simulating tactile feedback
Methodology
Problem Definition
End-to-end RL framework
Features from tactile feedback
Design of multi-objective reward
Experiments and Results
Experimental setup
CNN baseline model
Performance Metrics
Trade-off between minimum attempt and minimum force
...and 2 more sections

Figures (9)

Figure 1: Classic stable grasp task kolamuri2021improving but with non-static load, variable weight, force control and tactile feedback (blue arrows). The sliding of load during step $t$ (left) has to be taken into account for the re-grasp in step $t+1$ (right).
Figure 2: Visualization of tactile maps under 4 scenarios: (top left) slippage due to insufficient gripping force; (bottom left) successful grasp at the correct location; (right) grasping at the opposite sides.
Figure 3: Plot of average shear force magnitude of the tactile sensor (blue) over three stages when sliding occurs (green) during a grasp and lift. This shows the phases in the temporal tactile data that the control policy has to pick up.
Figure 4: Transformer-based control policy model consists of: 1. shared CNN projection block that project tactile map into token embedding; 2. Positional encoding to embed timestamp information onto tokens; 3. Learnable readout token for generating policy output.
Figure 5: Transformer-encoder is a series of $N=8$ multi-head cross attention stack. The dimension of token and MLP are 32 and 128, respectively.
...and 4 more figures

Learning Stable Robot Grasping with Transformer-based Tactile Control Policies

TL;DR

Abstract

Learning Stable Robot Grasping with Transformer-based Tactile Control Policies

Authors

TL;DR

Abstract

Table of Contents

Figures (9)