Table of Contents
Fetching ...

SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning

Taewook Nam, Sung Ju Hwang

TL;DR

The paper tackles slow execution of pre-trained robotic policies, which often rely on costly demonstrations for faster behavior. It introduces SpeedAug, a two-stage framework that first pre-trains a diffusion-based policy on speed-augmented demonstrations to encode diverse tempos, then fine-tunes it with RL (DPPO) to converge on safe, fast execution. The key contributions are the tempo-enriched pre-training of a multimodal diffusion policy, and the RL fine-tuning strategy that yields substantial improvements in sample efficiency while preserving high task success. Empirical results on Robosuite and Kitchen tasks demonstrate significant speedups with fewer online samples, highlighting practical benefits for accelerated robotic manipulation.

Abstract

Recent advances in robotic policy learning have enabled complex manipulation in real-world environments, yet the execution speed of these policies often lags behind hardware capabilities due to the cost of collecting faster demonstrations. Existing works on policy acceleration reinterpret action sequence for unseen execution speed, thereby encountering distributional shifts from the original demonstrations. Reinforcement learning is a promising approach that adapts policies for faster execution without additional demonstration, but its unguided exploration is sample inefficient. We propose SpeedAug, an RL-based policy acceleration framework that efficiently adapts pre-trained policies for faster task execution. SpeedAug constructs behavior prior that encompasses diverse tempos of task execution by pre-training a policy on speed-augmented demonstrations. Empirical results on robotic manipulation benchmarks show that RL fine-tuning initialized from this tempo-enriched policy significantly improves the sample efficiency of existing RL and policy acceleration methods while maintaining high success rate.

SpeedAug: Policy Acceleration via Tempo-Enriched Policy and RL Fine-Tuning

TL;DR

The paper tackles slow execution of pre-trained robotic policies, which often rely on costly demonstrations for faster behavior. It introduces SpeedAug, a two-stage framework that first pre-trains a diffusion-based policy on speed-augmented demonstrations to encode diverse tempos, then fine-tunes it with RL (DPPO) to converge on safe, fast execution. The key contributions are the tempo-enriched pre-training of a multimodal diffusion policy, and the RL fine-tuning strategy that yields substantial improvements in sample efficiency while preserving high task success. Empirical results on Robosuite and Kitchen tasks demonstrate significant speedups with fewer online samples, highlighting practical benefits for accelerated robotic manipulation.

Abstract

Recent advances in robotic policy learning have enabled complex manipulation in real-world environments, yet the execution speed of these policies often lags behind hardware capabilities due to the cost of collecting faster demonstrations. Existing works on policy acceleration reinterpret action sequence for unseen execution speed, thereby encountering distributional shifts from the original demonstrations. Reinforcement learning is a promising approach that adapts policies for faster execution without additional demonstration, but its unguided exploration is sample inefficient. We propose SpeedAug, an RL-based policy acceleration framework that efficiently adapts pre-trained policies for faster task execution. SpeedAug constructs behavior prior that encompasses diverse tempos of task execution by pre-training a policy on speed-augmented demonstrations. Empirical results on robotic manipulation benchmarks show that RL fine-tuning initialized from this tempo-enriched policy significantly improves the sample efficiency of existing RL and policy acceleration methods while maintaining high success rate.

Paper Structure

This paper contains 15 sections, 8 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: Our Framework. We propose to fine-tune tempo-enriched policy by RL for efficient policy acceleration. Our pre-training with speed-augmentation of demonstrations construct a behavior prior that can structurally explore and acquire accelerated task completion action.
  • Figure 2: Online Fine-Tuning Performance. Learning progress of success rate and task completion time over environment steps collected. Our model achieves faster execution using fewer online samples while maintaining high success rates, compared to an RL baseline and other policy acceleration methods.
  • Figure 3: Qualitative Results. Generated action sequences visualized on the XZ-plane of the robot workspace. Our pre-trained policy explores diverse tempos of actions, then shifts the action distribution toward to faster actions by fine-tuning, while keeping its behavior diversity. Without acceleration strategy, the unguided random exploration inefficiently search for faster behavior. Other acceleration methods successfully explore faster actions but lack of behavior diversity for further exploration of faster and stable actions.
  • Figure 4: Ablation study on acceleration factor during pre-training.Left: The sample efficiency comparison for different choices of the acceleration range $[1, v_\text{max}]$. Center: The sample efficiency comparison between constant acceleration and randomly sampled acceleration. Right: The sample efficiency comparison between constant acceleration and randomly sampled acceleration.
  • Figure 5: Block Lifting. Lift a block until it reaches a target height. Pick-and-Place Can. Pick a can and place it into the correct bin. Nut Assembly Square. Fit the square-shaped nut to the matching peg. Kitchen. Complete 7 possible subtasks by manipulating with the objects correctly.
  • ...and 6 more figures