Table of Contents
Fetching ...

Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy

Inkook Chun, Seungjae Lee, Michael S. Albergo, Saining Xie, Eric Vanden-Eijnden

TL;DR

DA-SIP solves the inefficiency of fixed inference budgets in diffusion- and flow-based robotic policies by introducing a difficulty classifier that selects the test-time configuration ⟨$N_t$, $\text{solver}_t$, $\text{type}_t$⟩ for each control cycle within a unified stochastic interpolant (SI) policy. By grounding inference in the SI framework, it enables dynamic trade-offs between speed and precision, allocating more compute to harder subtasks and less to easy ones. Across diverse simulated manipulation tasks, DA-SIP achieves reductions of $2.6$–$4.4\times$ in total compute while maintaining comparable task success to maximum-budget baselines, with fine-tuned VLM-based difficulty classification offering a strong balance between accuracy and latency. These results point to efficient, context-aware generative robot controllers that can operate effectively under resource constraints and pave the way for real-world deployment with larger robotics foundations models.

Abstract

Diffusion- and flow-based policies deliver state-of-the-art performance on long-horizon robotic manipulation and imitation learning tasks. However, these controllers employ a fixed inference budget at every control step, regardless of task complexity, leading to computational inefficiency for simple subtasks while potentially underperforming on challenging ones. To address these issues, we introduce Difficulty-Aware Stochastic Interpolant Policy (DA-SIP), a framework that enables robotic controllers to adaptively adjust their integration horizon in real time based on task difficulty. Our approach employs a difficulty classifier that analyzes observations to dynamically select the step budget, the optimal solver variant, and ODE/SDE integration at each control cycle. DA-SIP builds upon the stochastic interpolant formulation to provide a unified framework that unlocks diverse training and inference configurations for diffusion- and flow-based policies. Through comprehensive benchmarks across diverse manipulation tasks, DA-SIP achieves 2.6-4.4x reduction in total computation time while maintaining task success rates comparable to fixed maximum-computation baselines. By implementing adaptive computation within this framework, DA-SIP transforms generative robot controllers into efficient, task-aware systems that intelligently allocate inference resources where they provide the greatest benefit.

Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy

TL;DR

DA-SIP solves the inefficiency of fixed inference budgets in diffusion- and flow-based robotic policies by introducing a difficulty classifier that selects the test-time configuration ⟨, , ⟩ for each control cycle within a unified stochastic interpolant (SI) policy. By grounding inference in the SI framework, it enables dynamic trade-offs between speed and precision, allocating more compute to harder subtasks and less to easy ones. Across diverse simulated manipulation tasks, DA-SIP achieves reductions of in total compute while maintaining comparable task success to maximum-budget baselines, with fine-tuned VLM-based difficulty classification offering a strong balance between accuracy and latency. These results point to efficient, context-aware generative robot controllers that can operate effectively under resource constraints and pave the way for real-world deployment with larger robotics foundations models.

Abstract

Diffusion- and flow-based policies deliver state-of-the-art performance on long-horizon robotic manipulation and imitation learning tasks. However, these controllers employ a fixed inference budget at every control step, regardless of task complexity, leading to computational inefficiency for simple subtasks while potentially underperforming on challenging ones. To address these issues, we introduce Difficulty-Aware Stochastic Interpolant Policy (DA-SIP), a framework that enables robotic controllers to adaptively adjust their integration horizon in real time based on task difficulty. Our approach employs a difficulty classifier that analyzes observations to dynamically select the step budget, the optimal solver variant, and ODE/SDE integration at each control cycle. DA-SIP builds upon the stochastic interpolant formulation to provide a unified framework that unlocks diverse training and inference configurations for diffusion- and flow-based policies. Through comprehensive benchmarks across diverse manipulation tasks, DA-SIP achieves 2.6-4.4x reduction in total computation time while maintaining task success rates comparable to fixed maximum-computation baselines. By implementing adaptive computation within this framework, DA-SIP transforms generative robot controllers into efficient, task-aware systems that intelligently allocate inference resources where they provide the greatest benefit.

Paper Structure

This paper contains 36 sections, 10 equations, 3 figures, 20 tables.

Figures (3)

  • Figure 1: Overview of the DA-SIP framework with computational efficiency gains and performance retention
  • Figure 2: High-level overview of our difficulty-aware stochastic interpolant policy (DA-SIP) framework. During training we choose (1) a prediction target (noise, score, or velocity), (2) an interpolant (Linear, VP, GVP, etc.), and this configuration yields a single generative policy network that can perform both ODE and SDE integration. At inference time, a learned difficulty classifier adaptively selects an inference configuration triple $\langle \text{step count}, \text{solver type}, \text{ODE/SDE formulation} \rangle$ based on the current state---enabling context-dependent "System 1 vs. System 2" compute that maximizes success while minimizing latency.
  • Figure 3: Robot manipulation tasks across complexity categories.(A-B) Simple manipulation tasks (Can and Lift) require minimal computational steps while maintaining high success rates. (C-D) Transport and placement tasks (Transport and Square) show greater sensitivity to configuration choices, representing medium-complexity challenges. (E-F) Precision manipulation tasks (Push T and Block Push) demonstrate significant benefits from Heun integration and variance-preserving diffusion models for fine-grained control. (G-H) Exploratory manipulation tasks (Tool Hang and Multimodal Ant) highlight that adaptive resource allocation outperforms maximum computation, addressing complex challenges with varying difficulty levels.