Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers

Wongi Jeong; Kyungryeol Lee; Hoigi Seo; Se Young Chun

Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers

Wongi Jeong, Kyungryeol Lee, Hoigi Seo, Se Young Chun

TL;DR

This work investigates spatial acceleration for DiTs via latent upsampling and proposes a training-free spatial acceleration framework, dubbed Region-Adaptive Latent Upsampling (RALU), to mitigate artifacts while achieving spatial acceleration of DiTs by the authors' mixed-resolution latent upsampling.

Abstract

Diffusion transformers (DiTs) offer excellent scalability for high-fidelity generation, but their computational overhead poses a great challenge for practical deployment. Existing acceleration methods primarily exploit the temporal dimension, whereas spatial acceleration remains underexplored. In this work, we investigate spatial acceleration for DiTs via latent upsampling. We found that naïve latent upsampling for spatial acceleration introduces artifacts, primarily due to aliasing in high-frequency edge regions and mismatching from noise-timestep discrepancies. Then, based on these findings and analyses, we propose a training-free spatial acceleration framework, dubbed Region-Adaptive Latent Upsampling (RALU), to mitigate those artifacts while achieving spatial acceleration of DiTs by our mixed-resolution latent upsampling. RALU achieves artifact-free, efficient acceleration with early upsampling only on artifact-prone edge regions and noise-timestep matching for different latent resolutions, leading to up to 7.0$\times$ speedup on FLUX-1.dev and 3.0$\times$ on Stable Diffusion 3 with negligible quality degradation. Furthermore, our RALU is complementarily applicable to existing temporal acceleration methods and timestep-distilled models, leading to up to 15.9$\times$ speedup.

Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers

TL;DR

Abstract

Training-free Mixed-Resolution Latent Upsampling for Spatially Accelerated Diffusion Transformers

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (21)

Theorems & Definitions (3)