Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy

Inkook Chun; Seungjae Lee; Michael S. Albergo; Saining Xie; Eric Vanden-Eijnden

Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy

Inkook Chun, Seungjae Lee, Michael S. Albergo, Saining Xie, Eric Vanden-Eijnden

TL;DR

DA-SIP solves the inefficiency of fixed inference budgets in diffusion- and flow-based robotic policies by introducing a difficulty classifier that selects the test-time configuration ⟨$N_t$, $\text{solver}_t$, $\text{type}_t$⟩ for each control cycle within a unified stochastic interpolant (SI) policy. By grounding inference in the SI framework, it enables dynamic trade-offs between speed and precision, allocating more compute to harder subtasks and less to easy ones. Across diverse simulated manipulation tasks, DA-SIP achieves reductions of $2.6$–$4.4\times$ in total compute while maintaining comparable task success to maximum-budget baselines, with fine-tuned VLM-based difficulty classification offering a strong balance between accuracy and latency. These results point to efficient, context-aware generative robot controllers that can operate effectively under resource constraints and pave the way for real-world deployment with larger robotics foundations models.

Abstract

Diffusion- and flow-based policies deliver state-of-the-art performance on long-horizon robotic manipulation and imitation learning tasks. However, these controllers employ a fixed inference budget at every control step, regardless of task complexity, leading to computational inefficiency for simple subtasks while potentially underperforming on challenging ones. To address these issues, we introduce Difficulty-Aware Stochastic Interpolant Policy (DA-SIP), a framework that enables robotic controllers to adaptively adjust their integration horizon in real time based on task difficulty. Our approach employs a difficulty classifier that analyzes observations to dynamically select the step budget, the optimal solver variant, and ODE/SDE integration at each control cycle. DA-SIP builds upon the stochastic interpolant formulation to provide a unified framework that unlocks diverse training and inference configurations for diffusion- and flow-based policies. Through comprehensive benchmarks across diverse manipulation tasks, DA-SIP achieves 2.6-4.4x reduction in total computation time while maintaining task success rates comparable to fixed maximum-computation baselines. By implementing adaptive computation within this framework, DA-SIP transforms generative robot controllers into efficient, task-aware systems that intelligently allocate inference resources where they provide the greatest benefit.

Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy

TL;DR

Abstract

Dynamic Test-Time Compute Scaling in Control Policy: Difficulty-Aware Stochastic Interpolant Policy

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)