$\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
Minh-Quan Le, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, Dimitris Samaras
TL;DR
This paper tackles controllable, high-resolution image synthesis in domains requiring very large images, where traditional finite-dimensional diffusion models and patch-based methods struggle to preserve global structures or scale efficiently.It introduces $\infty$-Brush, a conditional diffusion model operating in function space with a cross-attention neural operator to condition in $\mathcal{H}$, enabling arbitrary resolutions up to $4096\times4096$ while training on only $0.4\%$ of pixels via a smoothing operator $\mathbf{A}$.Key contributions include the first conditional diffusion framework in infinite dimensions, the cross-attention neural operator for function-space conditioning, and a two-level denoiser (sparse grid) that maintains global coherence and local detail under large-scale generation.Empirical results on histopathology and satellite imagery demonstrate strong global-structure fidelity (CLIP-FID) and competitive local detail (Crop-FID) with favorable computational efficiency compared to finite-dimension baselines.
Abstract
Synthesizing high-resolution images from intricate, domain-specific information remains a significant challenge in generative modeling, particularly for applications in large-image domains such as digital histopathology and remote sensing. Existing methods face critical limitations: conditional diffusion models in pixel or latent space cannot exceed the resolution on which they were trained without losing fidelity, and computational demands increase significantly for larger image sizes. Patch-based methods offer computational efficiency but fail to capture long-range spatial relationships due to their overreliance on local information. In this paper, we introduce a novel conditional diffusion model in infinite dimensions, $\infty$-Brush for controllable large image synthesis. We propose a cross-attention neural operator to enable conditioning in function space. Our model overcomes the constraints of traditional finite-dimensional diffusion models and patch-based methods, offering scalability and superior capability in preserving global image structures while maintaining fine details. To our best knowledge, $\infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096\times4096$ pixels. The code is available at https://github.com/cvlab-stonybrook/infinity-brush.
