Table of Contents
Fetching ...

APriCoT: Action Primitives based on Contact-state Transition for In-Hand Tool Manipulation

Daichi Saito, Atsushi Kanehira, Kazuhiro Sasabuchi, Naoki Wake, Jun Takamatsu, Hideki Koike, Katsushi Ikeuchi

TL;DR

APriCoT introduces a primitives-based DRL framework for in-hand tool manipulation by decomposing long-horizon contact-state transitions into short-term detach, crossover, and attach actions learned as reusable policies. It builds a contact-state transition graph with stability and manipulability criteria, and employs a teacher-student training paradigm to learn robust primitive policies that can be chained to achieve a final grasp after a half-turn rotation. Experimental results in a high-fidelity simulator show successful rotation and grasp across varied object shapes, outperforming baselines that target only rotation or pinchable objects, while exhibiting strong robustness and latent-shape awareness. This approach offers improved sample efficiency and a scalable path toward real-world, diverse in-hand manipulations by reusing primitives and potentially integrating visual observations and hierarchical control.

Abstract

In-hand tool manipulation is an operation that not only manipulates a tool within the hand (i.e., in-hand manipulation) but also achieves a grasp suitable for a task after the manipulation. This study aims to achieve an in-hand tool manipulation skill through deep reinforcement learning. The difficulty of learning the skill arises because this manipulation requires (A) exploring long-term contact-state changes to achieve the desired grasp and (B) highly-varied motions depending on the contact-state transition. (A) leads to a sparsity of a reward on a successful grasp, and (B) requires an RL agent to explore widely within the state-action space to learn highly-varied actions, leading to sample inefficiency. To address these issues, this study proposes Action Primitives based on Contact-state Transition (APriCoT). APriCoT decomposes the manipulation into short-term action primitives by describing the operation as a contact-state transition based on three action representations (detach, crossover, attach). In each action primitive, fingers are required to perform short-term and similar actions. By training a policy for each primitive, we can mitigate the issues from (A) and (B). This study focuses on a fundamental operation as an example of in-hand tool manipulation: rotating an elongated object grasped with a precision grasp by half a turn to achieve the initial grasp. Experimental results demonstrated that ours succeeded in both the rotation and the achievement of the desired grasp, unlike existing studies. Additionally, it was found that the policy was robust to changes in object shape.

APriCoT: Action Primitives based on Contact-state Transition for In-Hand Tool Manipulation

TL;DR

APriCoT introduces a primitives-based DRL framework for in-hand tool manipulation by decomposing long-horizon contact-state transitions into short-term detach, crossover, and attach actions learned as reusable policies. It builds a contact-state transition graph with stability and manipulability criteria, and employs a teacher-student training paradigm to learn robust primitive policies that can be chained to achieve a final grasp after a half-turn rotation. Experimental results in a high-fidelity simulator show successful rotation and grasp across varied object shapes, outperforming baselines that target only rotation or pinchable objects, while exhibiting strong robustness and latent-shape awareness. This approach offers improved sample efficiency and a scalable path toward real-world, diverse in-hand manipulations by reusing primitives and potentially integrating visual observations and hierarchical control.

Abstract

In-hand tool manipulation is an operation that not only manipulates a tool within the hand (i.e., in-hand manipulation) but also achieves a grasp suitable for a task after the manipulation. This study aims to achieve an in-hand tool manipulation skill through deep reinforcement learning. The difficulty of learning the skill arises because this manipulation requires (A) exploring long-term contact-state changes to achieve the desired grasp and (B) highly-varied motions depending on the contact-state transition. (A) leads to a sparsity of a reward on a successful grasp, and (B) requires an RL agent to explore widely within the state-action space to learn highly-varied actions, leading to sample inefficiency. To address these issues, this study proposes Action Primitives based on Contact-state Transition (APriCoT). APriCoT decomposes the manipulation into short-term action primitives by describing the operation as a contact-state transition based on three action representations (detach, crossover, attach). In each action primitive, fingers are required to perform short-term and similar actions. By training a policy for each primitive, we can mitigate the issues from (A) and (B). This study focuses on a fundamental operation as an example of in-hand tool manipulation: rotating an elongated object grasped with a precision grasp by half a turn to achieve the initial grasp. Experimental results demonstrated that ours succeeded in both the rotation and the achievement of the desired grasp, unlike existing studies. Additionally, it was found that the policy was robust to changes in object shape.
Paper Structure (20 sections, 4 equations, 7 figures, 2 tables)

This paper contains 20 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: An example of a task after in-hand tool manipulation: pouring the contents of a box into a cup by shaking it with fingers. To orient an opening of the box towards the cup, the hand must rotate the box counterclockwise (red arrow in the second image from the left). To successfully complete this task, the hand should grasp the object as shown in (A) after manipulation. If the object is grasped as in (B) or (C), issues arise such as the fingers blocking the opening or the inability to properly change the object orientation.
  • Figure 2: The explanation of action representations and primitives. I, M, R, and T represent the index finger, middle finger, ring finger, and thumb, respectively. This figure shows an example of a contact-state transition in case of rotating the box counterclockwise. Detach, crossover, and attach are the action representations to transition the contact-state. The initial contact-state of a primitive is set to the most stable one where all fingers are in contact with the object.
  • Figure 3: The contact-state transitions in the targeted manipulation. Each section enclosed by dashed lines represents an action primitive.
  • Figure 4: Training overview. (A) explains the initial states used for training. The initial states of Policy B, C, D are the output states of Policy A, B, C, respectively. (B) illustrates the steps of teacher-student learning. The teacher policy takes observed information $\bm{o}_{t-l_1:t}$ and a latent variable $\bm{z}_t$, encoded from privileged information $\bm{h}_t$ by the encoder $\mu$, as input to predict an action $\bm{a}_t$. In contrast, the student policy uses $\bm{o}_{t-l_1:t}$ and a latent variable $\hat{\bm{z}}_t$, with $\hat{\bm{z}}_t$ being encoded from $\bm{o}_{t-l_2:t}$ by the encoder $\phi$.
  • Figure 5: The example of results. (A):Baseline A, (B):Baseline B, (C):Ours. At the final timestep, (A) achieved to rotate the object but the intended grasp is not realized. In (B), the object is fallen. On the other hand, (C) achieved the rotation and desired grasp.
  • ...and 2 more figures