CORN: Contact-based Object Representation for Nonprehensile Manipulation of General Unseen Objects
Yoonyoung Cho, Junhyek Han, Yoontae Cho, Beomjoon Kim
TL;DR
This work tackles nonprehensile manipulation across diverse unseen objects by introducing CORN, a contact-informed object representation learned via a collision-prediction pretraining task over a patch-based point-cloud encoder. A teacher policy, trained with privileged information, guides a student policy through distillation to operate with partial real-world observations, enabling zero-shot transfer from simulation. The approach combines a patch-transformer backbone with a collision-aware pretraining objective, achieving data- and time-efficient learning and enabling scalable parallel RL across thousands of environments. Results show state-of-the-art performance in simulation and robust sim-to-real transfer to unseen objects, highlighting CORN's potential for versatile real-world nonprehensile manipulation.
Abstract
Nonprehensile manipulation is essential for manipulating objects that are too thin, large, or otherwise ungraspable in the wild. To sidestep the difficulty of contact modeling in conventional modeling-based approaches, reinforcement learning (RL) has recently emerged as a promising alternative. However, previous RL approaches either lack the ability to generalize over diverse object shapes, or use simple action primitives that limit the diversity of robot motions. Furthermore, using RL over diverse object geometry is challenging due to the high cost of training a policy that takes in high-dimensional sensory inputs. We propose a novel contact-based object representation and pretraining pipeline to tackle this. To enable massively parallel training, we leverage a lightweight patch-based transformer architecture for our encoder that processes point clouds, thus scaling our training across thousands of environments. Compared to learning from scratch, or other shape representation baselines, our representation facilitates both time- and data-efficient learning. We validate the efficacy of our overall system by zero-shot transferring the trained policy to novel real-world objects. Code and videos are available at https://sites.google.com/view/contact-non-prehensile.
