AccDiffusion: An Accurate Method for Higher-Resolution Image Generation
Zhihang Lin, Mingbao Lin, Meng Zhao, Rongrong Ji
TL;DR
The paper tackles object repetition in patch-wise higher-resolution image generation using diffusion models. It introduces AccDiffusion, which decouples image prompts into patch-content-aware prompts derived from cross-attention maps and adds dilated sampling with window interaction to enhance global consistency. Through training-free extrapolation experiments, AccDiffusion achieves state-of-the-art metrics and clearer avoidance of repetition compared with baselines like MultiDiffusion, ScaleCrafter, and DemoFusion. This approach enables high-resolution generation without additional training costs, with practical impact for applications requiring detailed, coherent imagery at large scales.
Abstract
This paper attempts to address the object repetition issue in patch-wise higher-resolution image generation. We propose AccDiffusion, an accurate method for patch-wise higher-resolution image generation without training. An in-depth analysis in this paper reveals an identical text prompt for different patches causes repeated object generation, while no prompt compromises the image details. Therefore, our AccDiffusion, for the first time, proposes to decouple the vanilla image-content-aware prompt into a set of patch-content-aware prompts, each of which serves as a more precise description of an image patch. Besides, AccDiffusion also introduces dilated sampling with window interaction for better global consistency in higher-resolution image generation. Experimental comparison with existing methods demonstrates that our AccDiffusion effectively addresses the issue of repeated object generation and leads to better performance in higher-resolution image generation.
