Sim-to-Real Dynamic Object Manipulation on Conveyor Systems via Optimization Path Shaping
Zhuoling Li, Jinrong Yang, Yong Zhao, Liangliang Ren, Xiaoyang Wu, Zhenhua Xu, Hengshuang Zhao
TL;DR
The paper tackles generalizable dynamic object manipulation on conveyors by proposing GEM, a geometry-focused imitation-learning policy that prioritizes 3D structure over visual appearance to bridge the sim-to-real gap. GEM uses appearance-noise annealing to shape the optimization trajectory, guiding the network toward geometry-dominated representations, and employs a decomposition of manipulation actions into tracking and interaction components so it can handle objects moving at unseen speeds. The system is trained in Isaac Gym with diverse 3D geometries and four manipulation skills, then evaluated across in-domain/out-of-domain simulations and real-world settings, including a seven-day canteen deployment achieving 97.2% success over 10,000 operations. The work demonstrates strong generalization across backgrounds, motion patterns, unseen objects, and robot embodiments, offering a practical solution for industrial automation with minimal real-world data collection. Overall, GEM advances sim-to-real dynamic manipulation by leveraging geometry-centric representations, a probabilistic action head, memory, and action decomposition to achieve robust, scalable performance in real-world manufacturing contexts.
Abstract
Realizing generalizable dynamic object manipulation on conveyor systems is important for enhancing manufacturing efficiency, as it eliminates specialized engineering for different scenarios. To this end, imitation learning emerges as a promising paradigm, leveraging expert demonstrations to teach a policy manipulation skills. Although the generalization of an imitation learning policy can be improved by increasing demonstrations, demonstration collection is labor-intensive. Besides, public dynamic object manipulation data is scarce. In this work, we address this data scarcity problem via generating demonstrations in a simulator. A significant challenge of using simulated data lies in the appearance gap between simulated and real-world observations. To tackle this challenge, we propose Geometry-Enhanced Model (GEM), which employs our designed appearance noise annealing strategy to shape the policy optimization path, thereby prioritizing the geometry information in observations. Extensive experiments in simulated and real-world tasks demonstrate that GEM can generalize across environment backgrounds, robot embodiments, motion dynamics, and object geometries. Notably, GEM is deployed in a real canteen for tableware collection. Without test-scene data, GEM achieves a success rate of over 97% across more than 10,000 operations.
