Table of Contents
Fetching ...

SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time

Stanislav Frolov, Brian B. Moser, Andreas Dengel

TL;DR

SpotDiffusion introduces a time-shifted, non-overlapping window diffusion approach for fast panorama generation. By applying random shifts $s(t)$ with stride $\omega = W$, seams are corrected across timesteps without overlapping denoising predictions, dramatically reducing inference time while preserving image coherence. Empirical results show competition with, and in some cases improvements over, MultiDiffusion, SyncDiffusion, and StitchDiffusion in both quality metrics (FID, CLIPScore, ImageReward) and speed, including up to 6x faster generation. The method offers a practical, drop-in replacement for high-resolution diffusion-based panorama generation, with acknowledged limitations and avenues for dynamic stride optimization.

Abstract

Generating high-resolution images with generative models has recently been made widely accessible by leveraging diffusion models pre-trained on large-scale datasets. Various techniques, such as MultiDiffusion and SyncDiffusion, have further pushed image generation beyond training resolutions, i.e., from square images to panorama, by merging multiple overlapping diffusion paths or employing gradient descent to maintain perceptual coherence. However, these methods suffer from significant computational inefficiencies due to generating and averaging numerous predictions, which is required in practice to produce high-quality and seamless images. This work addresses this limitation and presents a novel approach that eliminates the need to generate and average numerous overlapping denoising predictions. Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next. This results in coherent, high-resolution images with fewer overall steps. We demonstrate the effectiveness of our approach through qualitative and quantitative evaluations, comparing it with MultiDiffusion, SyncDiffusion, and StitchDiffusion. Our method offers several key benefits, including improved computational efficiency and faster inference times while producing comparable or better image quality. Link to code https://github.com/stanifrolov/spotdiffusion

SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time

TL;DR

SpotDiffusion introduces a time-shifted, non-overlapping window diffusion approach for fast panorama generation. By applying random shifts with stride , seams are corrected across timesteps without overlapping denoising predictions, dramatically reducing inference time while preserving image coherence. Empirical results show competition with, and in some cases improvements over, MultiDiffusion, SyncDiffusion, and StitchDiffusion in both quality metrics (FID, CLIPScore, ImageReward) and speed, including up to 6x faster generation. The method offers a practical, drop-in replacement for high-resolution diffusion-based panorama generation, with acknowledged limitations and avenues for dynamic stride optimization.

Abstract

Generating high-resolution images with generative models has recently been made widely accessible by leveraging diffusion models pre-trained on large-scale datasets. Various techniques, such as MultiDiffusion and SyncDiffusion, have further pushed image generation beyond training resolutions, i.e., from square images to panorama, by merging multiple overlapping diffusion paths or employing gradient descent to maintain perceptual coherence. However, these methods suffer from significant computational inefficiencies due to generating and averaging numerous predictions, which is required in practice to produce high-quality and seamless images. This work addresses this limitation and presents a novel approach that eliminates the need to generate and average numerous overlapping denoising predictions. Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next. This results in coherent, high-resolution images with fewer overall steps. We demonstrate the effectiveness of our approach through qualitative and quantitative evaluations, comparing it with MultiDiffusion, SyncDiffusion, and StitchDiffusion. Our method offers several key benefits, including improved computational efficiency and faster inference times while producing comparable or better image quality. Link to code https://github.com/stanifrolov/spotdiffusion
Paper Structure (16 sections, 1 equation, 6 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 1 equation, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: MultiDiffusion bar2023multidiffusion can produce coherent panorama images by averaging overlapping denoising predictions. However, this process introduces computational inefficiencies and requires denoising many patch views. Without overlapping views, MultiDiffusion can not produce coherent panoramas. We introduce an efficient method for high-resolution panorama generation that eliminates the need for overlapping denoising predictions, resulting in coherent and sharp images without border artifacts.
  • Figure 2: MultiDiffusion bar2023multidiffusion generates coherent panorama images by averaging overlapping denoising predictions with a stride that is smaller than the denoising window. Our method eliminates the need for overlapping denoising predictions and introduces a more efficient shifting denoising method. Instead of relying on a fixed denoising path with overlapping views, our method shifts the denoising windows over time, ensuring that seams in one timestep are corrected in the next. This results in fast, seamless, high-resolution images with fewer overall steps.
  • Figure 3: Comparison of MultiDiffusion bar2023multidiffusion with varying stride sizes and our approach. MultiDiffusion with stride=64 (no overlap between views) matches our image generation times but produces strong border artifacts and visible seams due to disjoint diffusion paths. With stride=32 (50% overlap), MultiDiffusion still shows visible seams, and only with stride=16 (75% overlap) does MultiDiffusion produce seamless panoramas, but at the cost of increased computation. In contrast, our method consistently achieves seamless panoramas, reducing inference time by 6x without overlapping denoising views, making it more efficient for high-resolution image generation.
  • Figure 4: Our method can also replace the inner MultiDiffusion bar2023multidiffusion loop in SyncDiffusion lee2023syncdiffusion, leading to a 3x speedup in inference time without noticeable degradation in image quality. The generated panorama images are coherent and sharp without border artifacts, demonstrating the effectiveness of our shifted window denoising approach instead of requiring many overlapping patches.
  • Figure 5: Left: The number of total required denoising steps and thus image generation time depends on the stride of denoising windows. Given a default window size of 64, a stride of [64, 32, 16] corresponds to [0%, 50%, 75%] overlap between denoising windows, respectively. Middle: CLIPScore comparison of the base StableDiffusion model with MultiDiffusion bar2023multidiffusion and our method. As can be seen, our method reaches similar performance as MultiDiffusion bar2023multidiffusion but significantly faster. Right: FID comparison of SyncDiffusion lee2023syncdiffusion with our method. Our method achieves similar FID scores as SyncDiffusion lee2023syncdiffusion but with a fraction of the time. Notably, our method does not require overlapping denoising windows (window size = stride = 64) and subsequent averaging, making it more efficient.
  • ...and 1 more figures