SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time
Stanislav Frolov, Brian B. Moser, Andreas Dengel
TL;DR
SpotDiffusion introduces a time-shifted, non-overlapping window diffusion approach for fast panorama generation. By applying random shifts $s(t)$ with stride $\omega = W$, seams are corrected across timesteps without overlapping denoising predictions, dramatically reducing inference time while preserving image coherence. Empirical results show competition with, and in some cases improvements over, MultiDiffusion, SyncDiffusion, and StitchDiffusion in both quality metrics (FID, CLIPScore, ImageReward) and speed, including up to 6x faster generation. The method offers a practical, drop-in replacement for high-resolution diffusion-based panorama generation, with acknowledged limitations and avenues for dynamic stride optimization.
Abstract
Generating high-resolution images with generative models has recently been made widely accessible by leveraging diffusion models pre-trained on large-scale datasets. Various techniques, such as MultiDiffusion and SyncDiffusion, have further pushed image generation beyond training resolutions, i.e., from square images to panorama, by merging multiple overlapping diffusion paths or employing gradient descent to maintain perceptual coherence. However, these methods suffer from significant computational inefficiencies due to generating and averaging numerous predictions, which is required in practice to produce high-quality and seamless images. This work addresses this limitation and presents a novel approach that eliminates the need to generate and average numerous overlapping denoising predictions. Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next. This results in coherent, high-resolution images with fewer overall steps. We demonstrate the effectiveness of our approach through qualitative and quantitative evaluations, comparing it with MultiDiffusion, SyncDiffusion, and StitchDiffusion. Our method offers several key benefits, including improved computational efficiency and faster inference times while producing comparable or better image quality. Link to code https://github.com/stanifrolov/spotdiffusion
