Table of Contents
Fetching ...

NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

Yu Zeng, Charles Ochoa, Mingyuan Zhou, Vishal M. Patel, Vitor Guizilini, Rowan McAllister

TL;DR

This work introduces Phase-Preserving Diffusion (φ-PD), a diffusion framework that preserves input phase while randomizing magnitude to maintain spatial structure during generation, enabling structure-aligned results without architectural changes. It introduces Frequency-Selective Structured (FSS) noise to controllably balance structure preservation and creative flexibility through a single cutoff parameter. The approach is model-agnostic and extends to videos, showing improvements in photorealistic and stylized re-rendering, as well as sim-to-real driving enhancement with notable gains in downstream planner performance (e.g., ~50% improvement in CARLA-to-Waymo transfer). The method achieves these benefits with no inference-time overhead and remains compatible with DDPMs and flow-matching, offering a lightweight alternative to heavier conditioning modules.

Abstract

Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.

NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

TL;DR

This work introduces Phase-Preserving Diffusion (φ-PD), a diffusion framework that preserves input phase while randomizing magnitude to maintain spatial structure during generation, enabling structure-aligned results without architectural changes. It introduces Frequency-Selective Structured (FSS) noise to controllably balance structure preservation and creative flexibility through a single cutoff parameter. The approach is model-agnostic and extends to videos, showing improvements in photorealistic and stylized re-rendering, as well as sim-to-real driving enhancement with notable gains in downstream planner performance (e.g., ~50% improvement in CARLA-to-Waymo transfer). The method achieves these benefits with no inference-time overhead and remains compatible with DDPMs and flow-matching, offering a lightweight alternative to heavier conditioning modules.

Abstract

Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.

Paper Structure

This paper contains 18 sections, 21 equations, 11 figures, 2 tables.

Figures (11)

  • Figure 1: We present Phase-Preserving Diffusion ($\phi$-PD), a model-agnostic reformulation of the diffusion process that preserves an image's phase while randomizing its magnitude, enabling structure-aligned generation with no architectural changes or additional parameters.
  • Figure 2: Unlike prior approaches that modify architectures and add overhead, $\phi$-PD preserves structure via phase consistency, remaining lightweight and model-agnostic, reflecting that image-conditioned generation should be simpler, not harder.
  • Figure 3: Mixing phase and magnitude from two images. The mixture keeps the structure of the image where the phase is taken.
  • Figure 4: Frequency Selective Structured (FSS) Noise with increasing cutoff radius $r$.
  • Figure 5: Image generated with the same noise and different cutoff radius $r$. Results are based on SD1.5.
  • ...and 6 more figures