ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning
Zihan Zhou, Animesh Garg, Ajay Mandlekar, Caelan Garrett
TL;DR
ReinforceGen tackles long-horizon robotic manipulation by marrying automated object-centric data generation with a Hybrid Skill Policy trained by imitation learning and augmented by online reinforcement learning. The system uses motion planning to bridge skills, real-time initiation pose updates, and termination predictors to robustly stitch stages, with optional end-to-end distillation for fully learned control. Empirical results on Robosuite tasks show strong performance (80%+ success across tasks) and substantial gains from fine-tuning, replanning, and termination improvements, outperforming several baselines especially under partial observability. The work demonstrates a practical pipeline for scalable data-efficient learning in complex manipulation, while acknowledging limitations in demonstrations, task framing, and planning under partial observability.
Abstract
Long-horizon manipulation has been a long-standing challenge in the robotics community. We propose ReinforceGen, a system that combines task decomposition, data generation, imitation learning, and motion planning to form an initial solution, and improves each component through reinforcement-learning-based fine-tuning. ReinforceGen first segments the task into multiple localized skills, which are connected through motion planning. The skills and motion planning targets are trained with imitation learning on a dataset generated from 10 human demonstrations, and then fine-tuned through online adaptation and reinforcement learning. When benchmarked on the Robosuite dataset, ReinforceGen reaches 80% success rate on all tasks with visuomotor controls in the highest reset range setting. Additional ablation studies show that our fine-tuning approaches contributes to an 89% average performance increase. More results and videos available in https://reinforcegen.github.io/
