Table of Contents
Fetching ...

Look-Ahead and Look-Back Flows: Training-Free Image Generation with Trajectory Smoothing

Yan Luo, Henry Huang, Todd Y. Zhou, Mengyu Wang

TL;DR

This work tackles numerical instability in training-free, flow-based image generation by proposing Look-Ahead and Look-Back latent-trajectory smoothing. Unlike velocity-field edits, these methods operate in latent space and rely on curvature-aware interpolation and EMA-based averaging to preserve the pretrained flow while reducing discretization errors. Across COCO17, CUB-200, and Flickr30K, the proposed schemes consistently improve fidelity and semantic alignment with only negligible runtime overhead. The results suggest that training-free trajectory smoothing is a robust, general strategy for stabilizing flow-based generation without retraining existing models.

Abstract

Recent advances have reformulated diffusion models as deterministic ordinary differential equations (ODEs) through the framework of flow matching, providing a unified formulation for the noise-to-data generative process. Various training-free flow matching approaches have been developed to improve image generation through flow velocity field adjustment, eliminating the need for costly retraining. However, Modifying the velocity field $v$ introduces errors that propagate through the full generation path, whereas adjustments to the latent trajectory $z$ are naturally corrected by the pretrained velocity network, reducing error accumulation. In this paper, we propose two complementary training-free latent-trajectory adjustment approaches based on future and past velocity $v$ and latent trajectory $z$ information that refine the generative path directly in latent space. We propose two training-free trajectory smoothing schemes: \emph{Look-Ahead}, which averages the current and next-step latents using a curvature-gated weight, and \emph{Look-Back}, which smoothes latents using an exponential moving average with decay. We demonstrate through extensive experiments and comprehensive evaluation metrics that the proposed training-free trajectory smoothing models substantially outperform various state-of-the-art models across multiple datasets including COCO17, CUB-200, and Flickr30K.

Look-Ahead and Look-Back Flows: Training-Free Image Generation with Trajectory Smoothing

TL;DR

This work tackles numerical instability in training-free, flow-based image generation by proposing Look-Ahead and Look-Back latent-trajectory smoothing. Unlike velocity-field edits, these methods operate in latent space and rely on curvature-aware interpolation and EMA-based averaging to preserve the pretrained flow while reducing discretization errors. Across COCO17, CUB-200, and Flickr30K, the proposed schemes consistently improve fidelity and semantic alignment with only negligible runtime overhead. The results suggest that training-free trajectory smoothing is a robust, general strategy for stabilizing flow-based generation without retraining existing models.

Abstract

Recent advances have reformulated diffusion models as deterministic ordinary differential equations (ODEs) through the framework of flow matching, providing a unified formulation for the noise-to-data generative process. Various training-free flow matching approaches have been developed to improve image generation through flow velocity field adjustment, eliminating the need for costly retraining. However, Modifying the velocity field introduces errors that propagate through the full generation path, whereas adjustments to the latent trajectory are naturally corrected by the pretrained velocity network, reducing error accumulation. In this paper, we propose two complementary training-free latent-trajectory adjustment approaches based on future and past velocity and latent trajectory information that refine the generative path directly in latent space. We propose two training-free trajectory smoothing schemes: \emph{Look-Ahead}, which averages the current and next-step latents using a curvature-gated weight, and \emph{Look-Back}, which smoothes latents using an exponential moving average with decay. We demonstrate through extensive experiments and comprehensive evaluation metrics that the proposed training-free trajectory smoothing models substantially outperform various state-of-the-art models across multiple datasets including COCO17, CUB-200, and Flickr30K.
Paper Structure (32 sections, 16 equations, 6 figures, 5 tables, 3 algorithms)

This paper contains 32 sections, 16 equations, 6 figures, 5 tables, 3 algorithms.

Figures (6)

  • Figure 1: Conceptual illustration of training-free trajectory smoothing for flow sampling. Without trajectory smoothing (top), backward integration of the flow ODE suffers divergence and overshoot in low Signal-to-Noise Ratio (SNR) regions, causing the discrete trajectory to deviate from the ideal continuous flow and producing final samples that inaccurately reach the target distribution. With the trajectory smoothing mechanism (bottom), the trajectory maintains robust fidelity to the ideal continuous flow across both low and high SNR regions, ensuring stable progression and accurate convergence.
  • Figure 2: Schematic view of the proposed look-ahead sampling. Conventional flow sampling always takes full steps, which can overshoot in regions of high curvature and lead to a large deviation from the target. In contrast, the Look-Ahead scheme adaptively interpolates based on local curvature, modulating step sizes to better follow the underlying flow and achieve a significantly smaller endpoint error.
  • Figure 3: Schematic view of the proposed look-back sampling. Conventional sampling exhibits oscillations and overshoots the target, while Look-Back produces a smooth trajectory through exponential state averaging.
  • Figure 4: Qualitative comparison showing LookAhead and LookBack produce higher quality images with better coherence and detail than baseline methods. Scores shown are CLAIR / CLIPScore.
  • Figure 5: Visual effects of different $\gamma$ (Look-Ahead) and $\lambda$ (Look-Back). The Look-Ahead and Look-Back generations exhibit richer and more intricate visual details in the astronaut compared to the standard sampling. In the rainy portrait, the Look-Ahead and Look-Back generations produce more realistic raindrop details on the girl’s coat, making the scene more consistent with a rainy atmosphere, whereas the standard sampling fails to capture such effects.
  • ...and 1 more figures