MOVE: A Simple Motion-Based Data Collection Paradigm for Spatial Generalization in Robotic Manipulation
Huanqian Wang, Chi Bene Chen, Yang Yue, Danhua Tao, Tong Guo, Shaoxuan Xie, Denghang Huang, Shiji Song, Guocai Yao, Gao Huang
TL;DR
The paper addresses the critical challenge of spatial generalization in robotic manipulation under data scarcity. It introduces MOVE, a motion-based data collection paradigm that augments demonstrations with dynamic translations, rotations, and camera motion to densely cover spatial configurations, trained within a diffusion-policy framework using DDIM. Across simulated Meta-World tasks and real-world experiments, MOVE consistently improves spatial generalization and data efficiency, achieving substantial relative gains over static data collection and matching longer static data budgets with shorter dynamic ones. Ablation studies confirm the value of combining multiple dynamic dimensions and show robustness to augmentation hyperparameters. The work suggests that dynamic data collection can meaningfully reduce data requirements for robust spatial generalization in robotics, with potential extensions to more complex tasks and viewpoints.
Abstract
Imitation learning method has shown immense promise for robotic manipulation, yet its practical deployment is fundamentally constrained by the data scarcity. Despite prior work on collecting large-scale datasets, there still remains a significant gap to robust spatial generalization. We identify a key limitation: individual trajectories, regardless of their length, are typically collected from a \emph{single, static spatial configuration} of the environment. This includes fixed object and target spatial positions as well as unchanging camera viewpoints, which significantly restricts the diversity of spatial information available for learning. To address this critical bottleneck in data efficiency, we propose \textbf{MOtion-Based Variability Enhancement} (\emph{MOVE}), a simple yet effective data collection paradigm that enables the acquisition of richer spatial information from dynamic demonstrations. Our core contribution is an augmentation strategy that injects motion into any movable objects within the environment for each demonstration. This process implicitly generates a dense and diverse set of spatial configurations within a single trajectory. We conduct extensive experiments in both simulation and real-world environments to validate our approach. For example, in simulation tasks requiring strong spatial generalization, \emph{MOVE} achieves an average success rate of 39.1\%, a 76.1\% relative improvement over the static data collection paradigm (22.2\%), and yields up to 2--5$\times$ gains in data efficiency on certain tasks. Our code is available at https://github.com/lucywang720/MOVE.
