Table of Contents
Fetching ...

Speedup Patch: Learning a Plug-and-Play Policy to Accelerate Embodied Manipulation

Zhichao Wu, Junyin Ye, Zhilong Zhang, Yihao Sun, Haoxin Lin, Jiaheng Luo, Haoxiang Ren, Lei Yuan, Yang Yu

Abstract

While current embodied policies exhibit remarkable manipulation skills, their execution remains unsatisfactorily slow as they inherit the tardy pacing of human demonstrations. Existing acceleration methods typically require policy retraining or costly online interactions, limiting their scalability for large-scale foundation models. In this paper, we propose Speedup Patch (SuP), a lightweight, policy-agnostic framework that enables plug-and-play acceleration using solely offline data. SuP introduces an external scheduler that adaptively downsamples action chunks provided by embodied policies to eliminate redundancies. Specifically, we formalize the optimization of our scheduler as a Constrained Markov Decision Process (CMDP) aimed at maximizing efficiency without compromising task performance. Since direct success evaluation is infeasible in offline settings, SuP introduces World Model based state deviation as a surrogate metric to enforce safety constraints. By leveraging a learned world model as a virtual evaluator to predict counterfactual trajectories, the scheduler can be optimized via offline reinforcement learning. Empirical results on simulation benchmarks (Libero, Bigym) and real-world tasks validate that SuP achieves an overall 1.8x execution speedup for diverse policies while maintaining their original success rates.

Speedup Patch: Learning a Plug-and-Play Policy to Accelerate Embodied Manipulation

Abstract

While current embodied policies exhibit remarkable manipulation skills, their execution remains unsatisfactorily slow as they inherit the tardy pacing of human demonstrations. Existing acceleration methods typically require policy retraining or costly online interactions, limiting their scalability for large-scale foundation models. In this paper, we propose Speedup Patch (SuP), a lightweight, policy-agnostic framework that enables plug-and-play acceleration using solely offline data. SuP introduces an external scheduler that adaptively downsamples action chunks provided by embodied policies to eliminate redundancies. Specifically, we formalize the optimization of our scheduler as a Constrained Markov Decision Process (CMDP) aimed at maximizing efficiency without compromising task performance. Since direct success evaluation is infeasible in offline settings, SuP introduces World Model based state deviation as a surrogate metric to enforce safety constraints. By leveraging a learned world model as a virtual evaluator to predict counterfactual trajectories, the scheduler can be optimized via offline reinforcement learning. Empirical results on simulation benchmarks (Libero, Bigym) and real-world tasks validate that SuP achieves an overall 1.8x execution speedup for diverse policies while maintaining their original success rates.
Paper Structure (33 sections, 2 theorems, 19 equations, 13 figures, 11 tables, 1 algorithm)

This paper contains 33 sections, 2 theorems, 19 equations, 13 figures, 11 tables, 1 algorithm.

Key Result

Proposition 3.1

Given zero-violation constraint ($h_q(s_t,k_t)=0$) at each state, the scheduler is guaranteed to maintain or improve the success rate of the base policy. See App. app:proof1 for the proof.

Figures (13)

  • Figure 1: Plug-and-Play Speedup via Scheduler Policy. The scheduler policy $\pi$ predicts a downsampling rate $k$ to downsample the action chunk from the frozen policy into a shorter chunk for acceleration.
  • Figure 2: Success rate and Violation count. Each subplot (a--d) illustrates the relationship between the cumulative count of violations ($h_\mathcal{E}=1$) and the task success rate across different LIBERO suites. The bars represent the conditional success rate for the subset of trajectories containing at least $x$ violations, with the sample size of each subset annotated above the corresponding bar.
  • Figure 3: The training process of SuP. Our method is trained purely on offline datasets through three phases: (1) recurrent world model learning; (2) data synthesis; and (3) scheduler optimization via IQL. Through this pipeline, we optimize the scheduler to achieve the maximum possible speedup while preserving comparable performance.
  • Figure 4: Simulation Tasks. We systematically evaluate SuP across 20 tasks from Bigym and 4 task suites (40 tasks in total) from Libero.
  • Figure 5: Real-world Tasks Illustration. We illustrate the procedure of 3 real-world tasks: (a) Arange Table (b) Fold Towel (c) Stack Plates.
  • ...and 8 more figures

Theorems & Definitions (2)

  • Proposition 3.1
  • Proposition 4.1