Table of Contents
Fetching ...

BiNoMaP: Learning Category-Level Bimanual Non-Prehensile Manipulation Primitives

Huayi Zhou, Kui Jia

TL;DR

This work proposes a three-stage, RL-free framework for learning structured non-prehensile skills, and introduces a geometry-aware post-optimization algorithm to refine bimanual hand motion trajectories into executable manipulation primitives consistent with predefined motion patterns.

Abstract

Non-prehensile manipulation, encompassing ungraspable actions such as pushing, poking, pivoting, and wrapping, remains underexplored due to its contact-rich and analytically intractable nature. We revisit this problem from two perspectives. First, instead of relying on single-arm setups or favorable environmental supports (e.g., walls or edges), we advocate a generalizable dual-arm configuration and establish a suite of Bimanual Non-prehensile Manipulation Primitives (BiNoMaP). Second, departing from prevailing RL-based approaches, we propose a three-stage, RL-free framework for learning structured non-prehensile skills. We begin by extracting bimanual hand motion trajectories from video demonstrations. Since these coarse trajectories suffer from perceptual noise and morphological discrepancies, we introduce a geometry-aware post-optimization algorithm to refine them into executable manipulation primitives consistent with predefined motion patterns. To enable category-level generalization, the learned primitives are further parameterized by object-relevant geometric attributes, primarily size, allowing adaptation to unseen instances with significant shape variations. Importantly, BiNoMaP supports cross-embodiment transfer: the same primitives can be deployed on two real-world dual-arm platforms with distinct kinematic configurations, without redesigning skill structures. Extensive real-robot experiments across diverse objects and spatial configurations demonstrate the effectiveness, efficiency, and strong generalization capability of our approach.

BiNoMaP: Learning Category-Level Bimanual Non-Prehensile Manipulation Primitives

TL;DR

This work proposes a three-stage, RL-free framework for learning structured non-prehensile skills, and introduces a geometry-aware post-optimization algorithm to refine bimanual hand motion trajectories into executable manipulation primitives consistent with predefined motion patterns.

Abstract

Non-prehensile manipulation, encompassing ungraspable actions such as pushing, poking, pivoting, and wrapping, remains underexplored due to its contact-rich and analytically intractable nature. We revisit this problem from two perspectives. First, instead of relying on single-arm setups or favorable environmental supports (e.g., walls or edges), we advocate a generalizable dual-arm configuration and establish a suite of Bimanual Non-prehensile Manipulation Primitives (BiNoMaP). Second, departing from prevailing RL-based approaches, we propose a three-stage, RL-free framework for learning structured non-prehensile skills. We begin by extracting bimanual hand motion trajectories from video demonstrations. Since these coarse trajectories suffer from perceptual noise and morphological discrepancies, we introduce a geometry-aware post-optimization algorithm to refine them into executable manipulation primitives consistent with predefined motion patterns. To enable category-level generalization, the learned primitives are further parameterized by object-relevant geometric attributes, primarily size, allowing adaptation to unseen instances with significant shape variations. Importantly, BiNoMaP supports cross-embodiment transfer: the same primitives can be deployed on two real-world dual-arm platforms with distinct kinematic configurations, without redesigning skill structures. Extensive real-robot experiments across diverse objects and spatial configurations demonstrate the effectiveness, efficiency, and strong generalization capability of our approach.

Paper Structure

This paper contains 43 sections, 5 equations, 22 figures, 3 tables, 1 algorithm.

Figures (22)

  • Figure 1: (Left) We propose to extract coarse hand trajectories of non-prehensile skills from human video demonstrations, and then transfer them to the dual-arm robot. (Right) We extensively validated BiNoMaP on four skills (e.g., poking, pivoting, pushing, and wrapping) with the embodiment-agnostic skill transfer capability.
  • Figure 2: The framework overview of BiNoMaP. (1) The first stage leverages strong priors from hand demonstrations to obtain coarse dual-arm trajectories for non-prehensile tasks. (2) The second stage refines these trajectories to mitigate multi-source noise and improve execution stability. (3) The final stage generalizes learned skills to novel objects within the same category by parameterizing primitives.
  • Figure 3: Illustrations of the entire trajectory point optimization process, using skills pivoting (top) and wrapping (down) as examples. Best to view after zooming in.
  • Figure 4: Four non-prehensile skills instantiated with different tasks and diverse objects.
  • Figure 5: Qualitative real robot rollout samples of two bimanual non-prehensile skills (pivoting and wrapping) in another novel dual-arm manipulator platform.
  • ...and 17 more figures