Table of Contents
Fetching ...

Learning Dexterous Manipulation with Quantized Hand State

Ying Feng, Hongjie Fang, Yinong He, Jingjing Chen, Chenxi Wang, Zihao He, Ruonan Liu, Cewu Lu

Abstract

Dexterous robotic hands enable robots to perform complex manipulations that require fine-grained control and adaptability. Achieving such manipulation is challenging because the high degrees of freedom tightly couple hand and arm motions, making learning and control difficult. Successful dexterous manipulation relies not only on precise hand motions, but also on accurate spatial positioning of the arm and coordinated arm-hand dynamics. However, most existing visuomotor policies represent arm and hand actions in a single combined space, which often causes high-dimensional hand actions to dominate the coupled action space and compromise arm control. To address this, we propose DQ-RISE, which quantizes hand states to simplify hand motion prediction while preserving essential patterns, and applies a continuous relaxation that allows arm actions to diffuse jointly with these compact hand states. This design enables the policy to learn arm-hand coordination from data while preventing hand actions from overwhelming the action space. Experiments show that DQ-RISE achieves more balanced and efficient learning, paving the way toward structured and generalizable dexterous manipulation. Project website: http://rise-policy.github.io/DQ-RISE/

Learning Dexterous Manipulation with Quantized Hand State

Abstract

Dexterous robotic hands enable robots to perform complex manipulations that require fine-grained control and adaptability. Achieving such manipulation is challenging because the high degrees of freedom tightly couple hand and arm motions, making learning and control difficult. Successful dexterous manipulation relies not only on precise hand motions, but also on accurate spatial positioning of the arm and coordinated arm-hand dynamics. However, most existing visuomotor policies represent arm and hand actions in a single combined space, which often causes high-dimensional hand actions to dominate the coupled action space and compromise arm control. To address this, we propose DQ-RISE, which quantizes hand states to simplify hand motion prediction while preserving essential patterns, and applies a continuous relaxation that allows arm actions to diffuse jointly with these compact hand states. This design enables the policy to learn arm-hand coordination from data while preventing hand actions from overwhelming the action space. Experiments show that DQ-RISE achieves more balanced and efficient learning, paving the way toward structured and generalizable dexterous manipulation. Project website: http://rise-policy.github.io/DQ-RISE/

Paper Structure

This paper contains 16 sections, 1 equation, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Dexterous Manipulation from the Action Prediction Perspective. Beyond hand motion, successful dexterous manipulation also requires precise arm localization and coordinated arm-hand dynamics. (A) Existing visuomotor policies predict arm and hand actions jointly, causing hand actions to dominate the combined action space and arm localization to suffer. (B) Naively separating arm and hand predictions can lead to incoherent coordination. (C) Our approach quantizes hand states to preserve hand motion while jointly diffusing arm actions, enabling precise arm localization and smooth arm-hand coordination.
  • Figure 2: Robot Platform and Hybrid Dexterous Teleoperation System. Our platform consists of a Flexiv robotic arm equipped with an ROHand. During teleoperation, the arm is controlled via a VR joystick, where the joystick button can be used to pause arm motion and adjust the joystick pose for more intuitive and convenient operation. For hand control, we use a GForce glove to directly operate the ROHand using joint correspondence.
  • Figure 3: DQ-RISE Policy Architecture. ① Hand state data from demonstrations are used to train a residual VQ-VAE resvqvae for hand state quantization (§\ref{['sec:method-quantize']}); ② The trained codebooks yield $K$ quantized hand states, which are re-indexed to maintain consistency between consecutive codes and sequential continuity across all codes (§\ref{['sec:method-continuous']}); ③ The original hand states/actions are replaced by these re-indexed states in the demonstration dataset (§\ref{['sec:method-policy']}); ④ The visuomotor policy is trained on the transformed dataset, jointly diffusing arm and hand actions; during inference, the predicted continuous hand actions are projected to the nearest quantized actions for execution (§\ref{['sec:method-policy']}).
  • Figure 4: Different Action Prediction Frameworks. We select RISE, RISE-S, DQ-RISE-C as baselines and compare with our DQ-RISE.
  • Figure 5: Task Descriptions. We evaluate six tasks covering pick-and-place (Pull Tissue, Collect Toy), articulated object manipulation (Open Jar, Open Oven), tasks requiring large rotations (Open Jar, Pour Rice), and a long-horizon task (Toast Bread). Each task is illustrated with several phases, with the stages used for success rate evaluations highlighted in blue.
  • ...and 2 more figures