Table of Contents
Fetching ...

Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control

Zunzhe Zhang, Runhan Huang, Yicheng Liu, Shaoting Zhu, Linzhan Mou, Hang Zhao

Abstract

Diffusion models and flow matching have become a cornerstone of robotic imitation learning, yet they suffer from a structural inefficiency where inference is often bound to a fixed integration schedule that is agnostic to state complexity. This paradigm forces the policy to expend the same computational budget on trivial motions as it does on complex tasks. We introduce Generative Control as Optimization (GeCO), a time-unconditional framework that transforms action synthesis from trajectory integration into iterative optimization. GeCO learns a stationary velocity field in the action-sequence space where expert behaviors form stable attractors. Consequently, test-time inference becomes an adaptive process that allocates computation based on convergence--exiting early for simple states while refining longer for difficult ones. Furthermore, this stationary geometry yields an intrinsic, training-free safety signal, as the field norm at the optimized action serves as a robust out-of-distribution (OOD) detector, remaining low for in-distribution states while significantly increasing for anomalies. We validate GeCO on standard simulation benchmarks and demonstrate seamless scaling to pi0-series Vision-Language-Action (VLA) models. As a plug-and-play replacement for standard flow-matching heads, GeCO improves success rates and efficiency with an optimization-native mechanism for safe deployment. Video and code can be found at https://hrh6666.github.io/GeCO/

Generative Control as Optimization: Time Unconditional Flow Matching for Adaptive and Robust Robotic Control

Abstract

Diffusion models and flow matching have become a cornerstone of robotic imitation learning, yet they suffer from a structural inefficiency where inference is often bound to a fixed integration schedule that is agnostic to state complexity. This paradigm forces the policy to expend the same computational budget on trivial motions as it does on complex tasks. We introduce Generative Control as Optimization (GeCO), a time-unconditional framework that transforms action synthesis from trajectory integration into iterative optimization. GeCO learns a stationary velocity field in the action-sequence space where expert behaviors form stable attractors. Consequently, test-time inference becomes an adaptive process that allocates computation based on convergence--exiting early for simple states while refining longer for difficult ones. Furthermore, this stationary geometry yields an intrinsic, training-free safety signal, as the field norm at the optimized action serves as a robust out-of-distribution (OOD) detector, remaining low for in-distribution states while significantly increasing for anomalies. We validate GeCO on standard simulation benchmarks and demonstrate seamless scaling to pi0-series Vision-Language-Action (VLA) models. As a plug-and-play replacement for standard flow-matching heads, GeCO improves success rates and efficiency with an optimization-native mechanism for safe deployment. Video and code can be found at https://hrh6666.github.io/GeCO/
Paper Structure (64 sections, 20 equations, 6 figures, 8 tables, 1 algorithm)

This paper contains 64 sections, 20 equations, 6 figures, 8 tables, 1 algorithm.

Figures (6)

  • Figure 1: Generative Control as Optimization (GeCO).(A) The Paradigm Shift: Unlike standard flow matching which relies on rigid, time-dependent integration schedules (top), GeCO learns a stationary velocity field where inference becomes an iterative optimization process toward stable attractors (bottom). (B) Adaptive Computation: This formulation enables the policy to dynamically allocate computational budget based on state complexity—exiting early for simple transit phases (Scenario 1) while performing deep refinement for precise manipulation (Scenario 2). (C) Intrinsic Safety: The stationary geometry provides a zero-shot safety mechanism. In-distribution (ID) states converge to low-energy equilibria ($||f_\theta|| \approx 0$), whereas out-of-distribution (OOD) anomalies exhibit persistently high field norms ($||f_\theta|| \gg 0$), enabling robust detection.
  • Figure 2: Computation Follows Task Complexity. We visualize the spatial distribution of inference effort along a single rollout. The first three panels (a–c) are sampled from LIBERO-Spatial, and the last three panels (d–f) are from LIBERO-Object. The color of each line encodes the number of function evaluations (NFE) required for convergence at that state, ranging from blue (NFE = 1) to red (NFE = 20). This visualization illustrates how GeCO allocates more computation to challenging states while using fewer steps in easier regions.
  • Figure 3: GeCO policy execution for the Nut Assembly task. The robot performs high-precision alignment and rotational insertion.
  • Figure 4: GeCO policy execution for the Chemistry Tube Arrangement task. The policy adaptively handles the tight-tolerance insertion of fragile tubes.
  • Figure 5: Task setups for the real-world robotic deployment, showing the configurations for both the Nut Assembly and the Chemistry Tube Arrangement tasks.
  • ...and 1 more figures