Table of Contents
Fetching ...

Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image Generation

Shoukun Sun, Min Xian, Tiankai Yao, Fei Xu, Luca Capriotti

TL;DR

The paper addresses the challenge of producing large-content images with inexpensive, small diffusion models, where stitching overlapped patches often introduces seams, object discontinuities, and degraded detail. It introduces three components: Guided Fusion (GF) with a center-focused guidance map for weighted overlap fusion, Variance-Corrected Fusion (VCF) to restore correct variance when averaging with SDE samplers, and one-shot Style Alignment (SA) to harmonize initial noises via $\text{slerp}$ without extra compute. Together, these methods improve seam reduction, image fidelity, and content coherence in panorama generation, with quantitative gains in FID, KID, GIQA scores, and CLIP alignment, and substantial speed advantages for SA over gradient-based approaches. The work positions GF, VCF, and SA as plug-and-play modules that can enhance other fusion-based large-image generation methods, and provides code for broad adoption. The practical impact lies in enabling higher-quality, scalable large-image synthesis using existing compact diffusion models.

Abstract

Producing large images using small diffusion models is gaining increasing popularity, as the cost of training large models could be prohibitive. A common approach involves jointly generating a series of overlapped image patches and obtaining large images by merging adjacent patches. However, results from existing methods often exhibit noticeable artifacts, e.g., seams and inconsistent objects and styles. To address the issues, we proposed Guided Fusion (GF), which mitigates the negative impact from distant image regions by applying a weighted average to the overlapping regions. Moreover, we proposed Variance-Corrected Fusion (VCF), which corrects data variance at post-averaging, generating more accurate fusion for the Denoising Diffusion Probabilistic Model. Furthermore, we proposed a one-shot Style Alignment (SA), which generates a coherent style for large images by adjusting the initial input noise without adding extra computational burden. Extensive experiments demonstrated that the proposed fusion methods improved the quality of the generated image significantly. The proposed method can be widely applied as a plug-and-play module to enhance other fusion-based methods for large image generation. Code: https://github.com/TitorX/GVCFDiffusion

Guided and Variance-Corrected Fusion with One-shot Style Alignment for Large-Content Image Generation

TL;DR

The paper addresses the challenge of producing large-content images with inexpensive, small diffusion models, where stitching overlapped patches often introduces seams, object discontinuities, and degraded detail. It introduces three components: Guided Fusion (GF) with a center-focused guidance map for weighted overlap fusion, Variance-Corrected Fusion (VCF) to restore correct variance when averaging with SDE samplers, and one-shot Style Alignment (SA) to harmonize initial noises via without extra compute. Together, these methods improve seam reduction, image fidelity, and content coherence in panorama generation, with quantitative gains in FID, KID, GIQA scores, and CLIP alignment, and substantial speed advantages for SA over gradient-based approaches. The work positions GF, VCF, and SA as plug-and-play modules that can enhance other fusion-based large-image generation methods, and provides code for broad adoption. The practical impact lies in enabling higher-quality, scalable large-image synthesis using existing compact diffusion models.

Abstract

Producing large images using small diffusion models is gaining increasing popularity, as the cost of training large models could be prohibitive. A common approach involves jointly generating a series of overlapped image patches and obtaining large images by merging adjacent patches. However, results from existing methods often exhibit noticeable artifacts, e.g., seams and inconsistent objects and styles. To address the issues, we proposed Guided Fusion (GF), which mitigates the negative impact from distant image regions by applying a weighted average to the overlapping regions. Moreover, we proposed Variance-Corrected Fusion (VCF), which corrects data variance at post-averaging, generating more accurate fusion for the Denoising Diffusion Probabilistic Model. Furthermore, we proposed a one-shot Style Alignment (SA), which generates a coherent style for large images by adjusting the initial input noise without adding extra computational burden. Extensive experiments demonstrated that the proposed fusion methods improved the quality of the generated image significantly. The proposed method can be widely applied as a plug-and-play module to enhance other fusion-based methods for large image generation. Code: https://github.com/TitorX/GVCFDiffusion

Paper Structure

This paper contains 11 sections, 15 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Comparisons of panorama images generated by MultiDiffusion bar-tal_multidiffusion_2023, SyncDiffusion lee_syncdiffusion_2023 and our methods: Guided Fusion (GF), Variance-Corrected Fusion (VCF) and Style Alignment (SA). All images are generated with the same initial noise. The red boxes highlight the discontinuous and defective areas on the generated image.
  • Figure 2: Guided Fusion Map.
  • Figure 3: Images produced by direct averaging overlapped areas with DDIM and DDPM sampler, and a result from DDPM with Variance-Corrected Fusion (VCF).
  • Figure 4: MultiDiffusion (MD) compared with Guided Fusion (GF) with different strides. All images are generated with the same initial noise.
  • Figure 5: Image quality and diversity assessment using Style Alignment (SA) with different $\alpha$ values. The DDIM sampler is used.
  • ...and 2 more figures