Table of Contents
Fetching ...

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

Yuming Li, Peidong Jia, Daiwei Hong, Yueru Jia, Qi She, Rui Zhao, Ming Lu, Shanghang Zhang

TL;DR

ASGDiffusion tackles training-free high-resolution image generation by addressing pattern repetition through structure-guided denoising and a cross-attention mask. It introduces an asynchronous structure guidance strategy that enables multi-GPU parallelism, greatly accelerating HR image generation while maintaining semantic coherence. The method integrates with multiple Stable Diffusion variants, delivering strong qualitative and quantitative performance, particularly at resolutions like 2048×2048 and 3072×3072. While ultra-high-resolution limits remain, ASGDiffusion provides a practical, scalable approach for fast, high-quality HR diffusion without additional training.

Abstract

Training-free high-resolution (HR) image generation has garnered significant attention due to the high costs of training large diffusion models. Most existing methods begin by reconstructing the overall structure and then proceed to refine the local details. Despite their advancements, they still face issues with repetitive patterns in HR image generation. Besides, HR generation with diffusion models incurs significant computational costs. Thus, parallel generation is essential for interactive applications. To solve the above limitations, we introduce a novel method named ASGDiffusion for parallel HR generation with Asynchronous Structure Guidance (ASG) using pre-trained diffusion models. To solve the pattern repetition problem of HR image generation, ASGDiffusion leverages the low-resolution (LR) noise weighted by the attention mask as the structure guidance for the denoising step to ensure semantic consistency. The proposed structure guidance can significantly alleviate the pattern repetition problem. To enable parallel generation, we further propose a parallelism strategy, which calculates the patch noises and structure guidance asynchronously. By leveraging multi-GPU parallel acceleration, we significantly accelerate generation speed and reduce memory usage per GPU. Extensive experiments demonstrate that our method effectively and efficiently addresses common issues like pattern repetition and achieves state-of-the-art HR generation.

ASGDiffusion: Parallel High-Resolution Generation with Asynchronous Structure Guidance

TL;DR

ASGDiffusion tackles training-free high-resolution image generation by addressing pattern repetition through structure-guided denoising and a cross-attention mask. It introduces an asynchronous structure guidance strategy that enables multi-GPU parallelism, greatly accelerating HR image generation while maintaining semantic coherence. The method integrates with multiple Stable Diffusion variants, delivering strong qualitative and quantitative performance, particularly at resolutions like 2048×2048 and 3072×3072. While ultra-high-resolution limits remain, ASGDiffusion provides a practical, scalable approach for fast, high-quality HR diffusion without additional training.

Abstract

Training-free high-resolution (HR) image generation has garnered significant attention due to the high costs of training large diffusion models. Most existing methods begin by reconstructing the overall structure and then proceed to refine the local details. Despite their advancements, they still face issues with repetitive patterns in HR image generation. Besides, HR generation with diffusion models incurs significant computational costs. Thus, parallel generation is essential for interactive applications. To solve the above limitations, we introduce a novel method named ASGDiffusion for parallel HR generation with Asynchronous Structure Guidance (ASG) using pre-trained diffusion models. To solve the pattern repetition problem of HR image generation, ASGDiffusion leverages the low-resolution (LR) noise weighted by the attention mask as the structure guidance for the denoising step to ensure semantic consistency. The proposed structure guidance can significantly alleviate the pattern repetition problem. To enable parallel generation, we further propose a parallelism strategy, which calculates the patch noises and structure guidance asynchronously. By leveraging multi-GPU parallel acceleration, we significantly accelerate generation speed and reduce memory usage per GPU. Extensive experiments demonstrate that our method effectively and efficiently addresses common issues like pattern repetition and achieves state-of-the-art HR generation.

Paper Structure

This paper contains 21 sections, 6 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: The generated samples of ASGDiffusion based on Stable Diffusion 3 (SD3). While SD3 can synthesize images up to 1024x1024, our method enhances SD3's capability to generate images at resolutions exceeding 1024x1024 without requiring fine-tuning or high memory usage. Best viewed by zooming in.
  • Figure 2: The comparison of generated images, inference time, and GPU cost for different methods at 2048x2048 resolution on RTX 4090. Our method (ASGDiffusion) is the fastest and supports parallel processing.
  • Figure 3: The pipeline of ASGDiffusion. Following recent works, our method also consists of two stages. In the first stage, we refine the overall structure with the proposed asynchronous structure guidance(ASG). In the second stage, we recover the details to produce the final image. Right is the illustration of structure guidance with the cross-attention mask. We introduce a parallelism strategy to make the structure guidance asynchronous, allowing multi-GPU parallel acceleration.
  • Figure 4: Timeline visualization of asynchronous structure guidance(ASG). Comm. means communication. The Comm. overhead is fully hidden within the computation.
  • Figure 5: Comparison of different methods. (a) SDXL+BSRGAN, (b) MultiDiffusion, (c) ScaleCrafter, (d) DemoFusion, (e) CutDiffusion, (f) ASGDiffusion (Ours). MultiDiffusion, ScaleCrafter, and DemoFusion fail to solve the pattern repetition problem in HR generation. Our method, ASGDiffusion, refines the overall structure by the structure guidance. Additionally, we propose a parallelism strategy to make the structural guidance asynchronous, enabling multi-GPU acceleration.
  • ...and 7 more figures