Table of Contents
Fetching ...

TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration

Ye Li, Jiahe Feng, Yuan Meng, Kangye Ji, Chen Tang, Xinwan Wen, Shutao Xia, Zhi Wang, Wenwu Zhu

TL;DR

The paper introduces TS-DP, a temporal-complexity-aware speculative decoding framework for Diffusion Policy that uses a distilled Transformer drafter and an RL-based scheduler to adapt inference effort to time-varying task difficulty in embodied robotics. By replacing costly denoising calls with fast drafts and verifying them in parallel, TS-DP achieves up to 4.17× faster inference while maintaining lossless performance, reaching real-time operation at 25 Hz. The approach is validated across multiple benchmarks, demonstrating robust acceleration and stability, with adaptive scheduling proving superior to fixed-parameter baselines. This work enables practical, high-frequency diffusion-based control in dynamic environments, expanding the applicability of DP in real-time robotic systems.

Abstract

Diffusion Policy (DP) excels in embodied control but suffers from high inference latency and computational cost due to multiple iterative denoising steps. The temporal complexity of embodied tasks demands a dynamic and adaptable computation mode. Static and lossy acceleration methods, such as quantization, fail to handle such dynamic embodied tasks, while speculative decoding offers a lossless and adaptive yet underexplored alternative for DP. However, it is non-trivial to address the following challenges: how to match the base model's denoising quality at lower cost under time-varying task difficulty in embodied settings, and how to dynamically and interactively adjust computation based on task difficulty in such environments. In this paper, we propose Temporal-aware Reinforcement-based Speculative Diffusion Policy (TS-DP), the first framework that enables speculative decoding for DP with temporal adaptivity. First, to handle dynamic environments where task difficulty varies over time, we distill a Transformer-based drafter to imitate the base model and replace its costly denoising calls. Second, an RL-based scheduler further adapts to time-varying task difficulty by adjusting speculative parameters to maintain accuracy while improving efficiency. Extensive experiments across diverse embodied environments demonstrate that TS-DP achieves up to 4.17 times faster inference with over 94% accepted drafts, reaching an inference frequency of 25 Hz and enabling real-time diffusion-based control without performance degradation.

TS-DP: Reinforcement Speculative Decoding For Temporal Adaptive Diffusion Policy Acceleration

TL;DR

The paper introduces TS-DP, a temporal-complexity-aware speculative decoding framework for Diffusion Policy that uses a distilled Transformer drafter and an RL-based scheduler to adapt inference effort to time-varying task difficulty in embodied robotics. By replacing costly denoising calls with fast drafts and verifying them in parallel, TS-DP achieves up to 4.17× faster inference while maintaining lossless performance, reaching real-time operation at 25 Hz. The approach is validated across multiple benchmarks, demonstrating robust acceleration and stability, with adaptive scheduling proving superior to fixed-parameter baselines. This work enables practical, high-frequency diffusion-based control in dynamic environments, expanding the applicability of DP in real-time robotic systems.

Abstract

Diffusion Policy (DP) excels in embodied control but suffers from high inference latency and computational cost due to multiple iterative denoising steps. The temporal complexity of embodied tasks demands a dynamic and adaptable computation mode. Static and lossy acceleration methods, such as quantization, fail to handle such dynamic embodied tasks, while speculative decoding offers a lossless and adaptive yet underexplored alternative for DP. However, it is non-trivial to address the following challenges: how to match the base model's denoising quality at lower cost under time-varying task difficulty in embodied settings, and how to dynamically and interactively adjust computation based on task difficulty in such environments. In this paper, we propose Temporal-aware Reinforcement-based Speculative Diffusion Policy (TS-DP), the first framework that enables speculative decoding for DP with temporal adaptivity. First, to handle dynamic environments where task difficulty varies over time, we distill a Transformer-based drafter to imitate the base model and replace its costly denoising calls. Second, an RL-based scheduler further adapts to time-varying task difficulty by adjusting speculative parameters to maintain accuracy while improving efficiency. Extensive experiments across diverse embodied environments demonstrate that TS-DP achieves up to 4.17 times faster inference with over 94% accepted drafts, reaching an inference frequency of 25 Hz and enabling real-time diffusion-based control without performance degradation.

Paper Structure

This paper contains 15 sections, 18 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: Vanilla Diffusion Policy vs. Our TS-DP. For embodied tasks, DP completes a task through multiple interactions with the environment, generating several segments of actions. Each segment typically requires hundreds of model calls for denoising, making the process highly time-consuming. TS-DP introduces a customized speculative decoding framework for temporal-complexity-aware accelerating DP. A lightweight drafter generates multiple denoising results, which are verified in parallel by the DP for lossless acceleration. A scheduler further adjusts key speculative decoding parameters—such as the number of drafts and the acceptance rate—based on the complexity of different task phases, achieving adaptive and efficient acceleration.
  • Figure 1: Effect of TS-DP on Acceptance Rate and Draft Count. (Top) Acceptance Rate; (Bottom) Draft Count.
  • Figure 2: The framework of TS-DP.TS-DP is the first temporal-aware speculative decoding framework designed to accelerate Diffusion Policy. ① Decision stage: A PPO-based scheduler evaluates time-varying task difficulty using past observations and actions, and produces adaptive speculative parameters. It operates in parallel with the observation encoder, adding no extra inference latency. ② Denoising stage: Guided by the scheduler, a lightweight drafter executes multiple denoising steps sequentially, effectively replacing expensive DP calls and reducing computational load. ③ Verification stage: DP verifies all drafted steps in parallel, accepting those that pass validation and correcting the first rejected draft via reflection-maximal coupling to match the target distribution. ④ Reward generation: Based on task progress, the framework provides process rewards that encourage the scheduler to produce more valid drafts, and a final success-driven reward upon task completion, jointly guiding the scheduler toward efficient and successful task execution.
  • Figure 3: Impact of Drafter Parameters on Draft Acceptance Rate. Evolution of Metropolis–Hastings acceptance probability throughout the 100-step denoising process.
  • Figure 4: Effect of End-Effector Velocity on the Number of Accepted Drafts. Variation of Accepted Drafts and Velocity During the Can-ph Task: (Top) Accepted Drafts per Step; (Bottom) Velocity per Step.
  • ...and 2 more figures