BiNoMaP: Learning Category-Level Bimanual Non-Prehensile Manipulation Primitives
Huayi Zhou, Kui Jia
TL;DR
This work proposes a three-stage, RL-free framework for learning structured non-prehensile skills, and introduces a geometry-aware post-optimization algorithm to refine bimanual hand motion trajectories into executable manipulation primitives consistent with predefined motion patterns.
Abstract
Non-prehensile manipulation, encompassing ungraspable actions such as pushing, poking, pivoting, and wrapping, remains underexplored due to its contact-rich and analytically intractable nature. We revisit this problem from two perspectives. First, instead of relying on single-arm setups or favorable environmental supports (e.g., walls or edges), we advocate a generalizable dual-arm configuration and establish a suite of Bimanual Non-prehensile Manipulation Primitives (BiNoMaP). Second, departing from prevailing RL-based approaches, we propose a three-stage, RL-free framework for learning structured non-prehensile skills. We begin by extracting bimanual hand motion trajectories from video demonstrations. Since these coarse trajectories suffer from perceptual noise and morphological discrepancies, we introduce a geometry-aware post-optimization algorithm to refine them into executable manipulation primitives consistent with predefined motion patterns. To enable category-level generalization, the learned primitives are further parameterized by object-relevant geometric attributes, primarily size, allowing adaptation to unseen instances with significant shape variations. Importantly, BiNoMaP supports cross-embodiment transfer: the same primitives can be deployed on two real-world dual-arm platforms with distinct kinematic configurations, without redesigning skill structures. Extensive real-robot experiments across diverse objects and spatial configurations demonstrate the effectiveness, efficiency, and strong generalization capability of our approach.
