Table of Contents
Fetching ...

ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning

Zihan Zhou, Animesh Garg, Ajay Mandlekar, Caelan Garrett

TL;DR

ReinforceGen tackles long-horizon robotic manipulation by marrying automated object-centric data generation with a Hybrid Skill Policy trained by imitation learning and augmented by online reinforcement learning. The system uses motion planning to bridge skills, real-time initiation pose updates, and termination predictors to robustly stitch stages, with optional end-to-end distillation for fully learned control. Empirical results on Robosuite tasks show strong performance (80%+ success across tasks) and substantial gains from fine-tuning, replanning, and termination improvements, outperforming several baselines especially under partial observability. The work demonstrates a practical pipeline for scalable data-efficient learning in complex manipulation, while acknowledging limitations in demonstrations, task framing, and planning under partial observability.

Abstract

Long-horizon manipulation has been a long-standing challenge in the robotics community. We propose ReinforceGen, a system that combines task decomposition, data generation, imitation learning, and motion planning to form an initial solution, and improves each component through reinforcement-learning-based fine-tuning. ReinforceGen first segments the task into multiple localized skills, which are connected through motion planning. The skills and motion planning targets are trained with imitation learning on a dataset generated from 10 human demonstrations, and then fine-tuned through online adaptation and reinforcement learning. When benchmarked on the Robosuite dataset, ReinforceGen reaches 80% success rate on all tasks with visuomotor controls in the highest reset range setting. Additional ablation studies show that our fine-tuning approaches contributes to an 89% average performance increase. More results and videos available in https://reinforcegen.github.io/

ReinforceGen: Hybrid Skill Policies with Automated Data Generation and Reinforcement Learning

TL;DR

ReinforceGen tackles long-horizon robotic manipulation by marrying automated object-centric data generation with a Hybrid Skill Policy trained by imitation learning and augmented by online reinforcement learning. The system uses motion planning to bridge skills, real-time initiation pose updates, and termination predictors to robustly stitch stages, with optional end-to-end distillation for fully learned control. Empirical results on Robosuite tasks show strong performance (80%+ success across tasks) and substantial gains from fine-tuning, replanning, and termination improvements, outperforming several baselines especially under partial observability. The work demonstrates a practical pipeline for scalable data-efficient learning in complex manipulation, while acknowledging limitations in demonstrations, task framing, and planning under partial observability.

Abstract

Long-horizon manipulation has been a long-standing challenge in the robotics community. We propose ReinforceGen, a system that combines task decomposition, data generation, imitation learning, and motion planning to form an initial solution, and improves each component through reinforcement-learning-based fine-tuning. ReinforceGen first segments the task into multiple localized skills, which are connected through motion planning. The skills and motion planning targets are trained with imitation learning on a dataset generated from 10 human demonstrations, and then fine-tuned through online adaptation and reinforcement learning. When benchmarked on the Robosuite dataset, ReinforceGen reaches 80% success rate on all tasks with visuomotor controls in the highest reset range setting. Additional ablation studies show that our fine-tuning approaches contributes to an 89% average performance increase. More results and videos available in https://reinforcegen.github.io/

Paper Structure

This paper contains 35 sections, 5 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: ReinforceGen first creates an offline dataset by synthetic data generation from a small set of source human demonstrations. The dataset is then used to train a hybrid imitation learning agent that alternates between moving to a predicted waypoint using a motion planner and directly controlling the robot using a learned policy. Finally, ReinforceGen uses reinforcement learning to fine-tune the agent with online environment interactions.
  • Figure 2: The three main components of a ReinforceGen stage. The pose predictor $\mathcal{I}_{\theta_i}$ predicts the target end-effector pose and updates the motion planner in real-time. After reaching the destination, the skill policy $\pi_{\theta_i}$ takes control to complete the stage goal, which is determined by the termination predictor $\mathcal{T}_{\theta_i}$. All three components are first imitated from a generated dataset, then fine-tuned with online data.
  • Figure 3: Depiction of how we fine-tune the three components. (A) The pose predictor is fine-tuned towards a privileged teacher; During execution, it constantly updates its prediction based on new observations and reroutes when the deviation is too large. (B) The skill policy is fine-tuned through residual reinforcement learning. (C) We purge the false-positive predictions from the termination predictor.
  • Figure 4: Success rate drops sharply as the pose target noise level increases in the second stage of Nut Assembly. See App. \ref{['app:details:pose-noise']} for more details.
  • Figure 5: ReinforceGen agents complete high-precision skills with high success rates.
  • ...and 3 more figures