Table of Contents
Fetching ...

Dense-Jump Flow Matching with Non-Uniform Time Scheduling for Robotic Policies: Mitigating Multi-Step Inference Degradation

Zidong Chen, Zihao Guo, Peng Wang, ThankGod Itua Egbe, Yan Lyu, Chenghao Qian

TL;DR

A novel policy is proposed that utilises non-uniform time scheduling during training, which emphasises both early and late temporal stages to regularise policy training, and a dense-jump integration schedule at inference, which uses a single-step integration to replace the multi-step integration beyond a jump point, to avoid unstable areas around 1.

Abstract

Flow matching has emerged as a competitive framework for learning high-quality generative policies in robotics; however, we find that generalisation arises and saturates early along the flow trajectory, in accordance with recent findings in the literature. We further observe that increasing the number of Euler integration steps during inference counter-intuitively and universally degrades policy performance. We attribute this to (i) additional, uniformly spaced integration steps oversample the late-time region, thereby constraining actions towards the training trajectories and reducing generalisation; and (ii) the learned velocity field becoming non-Lipschitz as integration time approaches 1, causing instability. To address these issues, we propose a novel policy that utilises non-uniform time scheduling (e.g., U-shaped) during training, which emphasises both early and late temporal stages to regularise policy training, and a dense-jump integration schedule at inference, which uses a single-step integration to replace the multi-step integration beyond a jump point, to avoid unstable areas around 1. Essentially, our policy is an efficient one-step learner that still pushes forward performance through multi-step integration, yielding up to 23.7% performance gains over state-of-the-art baselines across diverse robotic tasks.

Dense-Jump Flow Matching with Non-Uniform Time Scheduling for Robotic Policies: Mitigating Multi-Step Inference Degradation

TL;DR

A novel policy is proposed that utilises non-uniform time scheduling during training, which emphasises both early and late temporal stages to regularise policy training, and a dense-jump integration schedule at inference, which uses a single-step integration to replace the multi-step integration beyond a jump point, to avoid unstable areas around 1.

Abstract

Flow matching has emerged as a competitive framework for learning high-quality generative policies in robotics; however, we find that generalisation arises and saturates early along the flow trajectory, in accordance with recent findings in the literature. We further observe that increasing the number of Euler integration steps during inference counter-intuitively and universally degrades policy performance. We attribute this to (i) additional, uniformly spaced integration steps oversample the late-time region, thereby constraining actions towards the training trajectories and reducing generalisation; and (ii) the learned velocity field becoming non-Lipschitz as integration time approaches 1, causing instability. To address these issues, we propose a novel policy that utilises non-uniform time scheduling (e.g., U-shaped) during training, which emphasises both early and late temporal stages to regularise policy training, and a dense-jump integration schedule at inference, which uses a single-step integration to replace the multi-step integration beyond a jump point, to avoid unstable areas around 1. Essentially, our policy is an efficient one-step learner that still pushes forward performance through multi-step integration, yielding up to 23.7% performance gains over state-of-the-art baselines across diverse robotic tasks.

Paper Structure

This paper contains 20 sections, 1 theorem, 13 equations, 3 figures, 3 tables, 1 algorithm.

Key Result

Theorem III.1

Let $v:U\subseteq \mathbb{R}^n\times\mathbb{R}\to\mathbb{R}^n$ be continuous in $t$ and locally Lipschitz in $x$. Then, for any $(t_0,x_0)\in U$, the initial value problem admits a unique solution in a neighborhood of $t_0$.

Figures (3)

  • Figure 1: Cosine similarity between the learned velocity $\hat{\text{v}} = v_\theta(\mathbf{a}_t, t, \mathbf{o})$ and (1) the ground-truth expert velocity $\text{v}_{\text{true}} = \mathbf{a}_1|\mathbf{o} - \mathbf{a}_0$, and (2) the nearest training action velocity $\text{v}_{\text{KNN}} = \mathbf{a}_{\text{KNN}} - \mathbf{a}_0$, as a function of integration time $t$. Results are shown for Walker2D, Adroit Pen Sparse, and Humanoid Standup from left to right. Both similarities degrade as $t \to 1$, indicating universal performance decline at late times. Notably, in the mid-to-late stages, alignment with the nearest training action $\cos(\hat{\text{v}}, \text{v}_{\text{KNN}})$ exceeds that with the ground truth $\cos(\hat{\text{v}}, \text{v}_{\text{true}})$, demonstrating that the learned velocity field overfits to training actions rather than generalising to the true expert action. This highlights a localised overfitting phenomenon in flow matching policies at mid-to-late integration times.
  • Figure 2: Comparison of time sampling strategies for flow matching training. The uniform sampling (top) allocates equal probability mass across all time steps $t \in [0,1]$, while the U-shaped sampling (bottom) concentrates additional probability mass at early times ($t \approx 0$) and late times ($t \approx 1$) with reduced sampling in the intermediate region. The U-shaped distribution enables more intensive training coverage in the condition-led region (near $t=0$) and the critical terminal region (near $t=1$) where accurate velocity field learning is essential for stable Dense-Jump inference, while maintaining regularisation against localised overfitting in the intermediate time steps.
  • Figure 3: Three simulation benchmarks used in our experiments.

Theorems & Definitions (1)

  • Theorem III.1: Picard–Lindelöf