Table of Contents
Fetching ...

Focused Blind Switching Manipulation Based on Constrained and Regional Touch States of Multi-Fingered Hand Using Deep Learning

Satoshi Funabashi, Atsumu Hiramoto, Naoya Chiba, Alexander Schmitz, Shardul Kulkarni, Tetsuya Ogata

TL;DR

The paper tackles fine-grained, tactile-guided multi-finger manipulation by introducing an AE-LSTM architecture that compresses abundant tactile information via autoencoders and generates time-series motions with an LSTM. A constrained loss term plus an attention mechanism guides the model to switch between sub-tasks based on touch and proprioceptive cues, enabling robust cap-opening across untrained objects and positions. Empirical results show that combining loss constraints with adaptive attention yields the highest complete and partial success rates, with attention localizing modality relevance per sub-task and PCA analysis revealing latent-loop dynamics that reflect effective switching. The work advances dexterous manipulation by integrating tactile-centric feature learning, temporal prediction, and adaptive multimodal emphasis, suggesting practical impact for real-time manipulation with multi-fingered hands.

Abstract

To achieve a desired grasping posture (including object position and orientation), multi-finger motions need to be conducted according to the the current touch state. Specifically, when subtle changes happen during correcting the object state, not only proprioception but also tactile information from the entire hand can be beneficial. However, switching motions with high-DOFs of multiple fingers and abundant tactile information is still challenging. In this study, we propose a loss function with constraints of touch states and an attention mechanism for focusing on important modalities depending on the touch states. The policy model is AE-LSTM which consists of Autoencoder (AE) which compresses abundant tactile information and Long Short-Term Memory (LSTM) which switches the motion depending on the touch states. Motion for cap-opening was chosen as a target task which consists of subtasks of sliding an object and opening its cap. As a result, the proposed method achieved the best success rates with a variety of objects for real time cap-opening manipulation. Furthermore, we could confirm that the proposed model acquired the features of each subtask and attention on specific modalities.

Focused Blind Switching Manipulation Based on Constrained and Regional Touch States of Multi-Fingered Hand Using Deep Learning

TL;DR

The paper tackles fine-grained, tactile-guided multi-finger manipulation by introducing an AE-LSTM architecture that compresses abundant tactile information via autoencoders and generates time-series motions with an LSTM. A constrained loss term plus an attention mechanism guides the model to switch between sub-tasks based on touch and proprioceptive cues, enabling robust cap-opening across untrained objects and positions. Empirical results show that combining loss constraints with adaptive attention yields the highest complete and partial success rates, with attention localizing modality relevance per sub-task and PCA analysis revealing latent-loop dynamics that reflect effective switching. The work advances dexterous manipulation by integrating tactile-centric feature learning, temporal prediction, and adaptive multimodal emphasis, suggesting practical impact for real-time manipulation with multi-fingered hands.

Abstract

To achieve a desired grasping posture (including object position and orientation), multi-finger motions need to be conducted according to the the current touch state. Specifically, when subtle changes happen during correcting the object state, not only proprioception but also tactile information from the entire hand can be beneficial. However, switching motions with high-DOFs of multiple fingers and abundant tactile information is still challenging. In this study, we propose a loss function with constraints of touch states and an attention mechanism for focusing on important modalities depending on the touch states. The policy model is AE-LSTM which consists of Autoencoder (AE) which compresses abundant tactile information and Long Short-Term Memory (LSTM) which switches the motion depending on the touch states. Motion for cap-opening was chosen as a target task which consists of subtasks of sliding an object and opening its cap. As a result, the proposed method achieved the best success rates with a variety of objects for real time cap-opening manipulation. Furthermore, we could confirm that the proposed model acquired the features of each subtask and attention on specific modalities.

Paper Structure

This paper contains 21 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Schematic of the proposed motion-generating method. The model consists of AE and LSTM blocks to have an attention mechanism and handle time-series information.
  • Figure 2: Detailed proposed AE-LSTM architecture. The AE is prepared for tactile information from the whole hand and local part of the hand (thumb in this study), respectively. The tactile features with joint and torque information are input to the attention layer so that the proposed network can focus on arbitrary modalities depending on the sub-tasks. The attention layer outputs the weighed features by the attention and they are input to the LSTM. The LSTM predicts the next time step of the sensor information (joint, torque and tactile features). The predicted joint information is used for controlling the Allegro Hand.
  • Figure 3: Target objects and initial grasping positions. Trained objects and positions: 5 daily objects were prepared. We put 5 red markers around a cap as an initial position for trained objects. The initial position is fixed to be the same position as the index finger. Testing positions were also prepared for an evaluation of real-time manipulation. The markers are 4 in-between the markers for training. Untrained objects: Ten objects were prepared for evaluating the generalization ability of the proposed method. Three red markers were put as initial positions for testing.
  • Figure 4: Target manipulation motion. Firstly, fingers are closed. Then the robot tries to open a cap. If the cap is opened, the robot stops, otherwise the robot slides the grasped object either right or left. Repeatedly, the robot tries to open the cap.
  • Figure 5: Failures made by comparison models. Top row shows when the model without constrained generates motion. It could not recognize that the cap was opened. Bottom row shows when the model without attention mechanism generates motion. It could not slide the object enough and thumb got stuck at the rim of the object and the opening motion failed.
  • ...and 2 more figures