$\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

Minh-Quan Le; Alexandros Graikos; Srikar Yellapragada; Rajarsi Gupta; Joel Saltz; Dimitris Samaras

$\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

Minh-Quan Le, Alexandros Graikos, Srikar Yellapragada, Rajarsi Gupta, Joel Saltz, Dimitris Samaras

TL;DR

This paper tackles controllable, high-resolution image synthesis in domains requiring very large images, where traditional finite-dimensional diffusion models and patch-based methods struggle to preserve global structures or scale efficiently.It introduces $\infty$-Brush, a conditional diffusion model operating in function space with a cross-attention neural operator to condition in $\mathcal{H}$, enabling arbitrary resolutions up to $4096\times4096$ while training on only $0.4\%$ of pixels via a smoothing operator $\mathbf{A}$.Key contributions include the first conditional diffusion framework in infinite dimensions, the cross-attention neural operator for function-space conditioning, and a two-level denoiser (sparse grid) that maintains global coherence and local detail under large-scale generation.Empirical results on histopathology and satellite imagery demonstrate strong global-structure fidelity (CLIP-FID) and competitive local detail (Crop-FID) with favorable computational efficiency compared to finite-dimension baselines.

Abstract

Synthesizing high-resolution images from intricate, domain-specific information remains a significant challenge in generative modeling, particularly for applications in large-image domains such as digital histopathology and remote sensing. Existing methods face critical limitations: conditional diffusion models in pixel or latent space cannot exceed the resolution on which they were trained without losing fidelity, and computational demands increase significantly for larger image sizes. Patch-based methods offer computational efficiency but fail to capture long-range spatial relationships due to their overreliance on local information. In this paper, we introduce a novel conditional diffusion model in infinite dimensions, $\infty$-Brush for controllable large image synthesis. We propose a cross-attention neural operator to enable conditioning in function space. Our model overcomes the constraints of traditional finite-dimensional diffusion models and patch-based methods, offering scalability and superior capability in preserving global image structures while maintaining fine details. To our best knowledge, $\infty$-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to $4096\times4096$ pixels. The code is available at https://github.com/cvlab-stonybrook/infinity-brush.

$\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

TL;DR

Abstract

-Brush for controllable large image synthesis. We propose a cross-attention neural operator to enable conditioning in function space. Our model overcomes the constraints of traditional finite-dimensional diffusion models and patch-based methods, offering scalability and superior capability in preserving global image structures while maintaining fine details. To our best knowledge,

-Brush is the first conditional diffusion model in function space, that can controllably synthesize images at arbitrary resolutions of up to

pixels. The code is available at https://github.com/cvlab-stonybrook/infinity-brush.

Paper Structure (23 sections, 8 theorems, 40 equations, 11 figures, 5 tables)

This paper contains 23 sections, 8 theorems, 40 equations, 11 figures, 5 tables.

Introduction
Related Work
Preliminaries
Notation and Data
Gaussian Measures on Hilbert Spaces
Diffusion Models in Function Space
Neural Operators
The Proposed Method
Conditional Diffusion Models in Function Space
Conditional Denoiser with Cross-Attention Neural Operators
Experiments
Experimental Settings
Implementation Details
Experimental Results
Limitations
...and 8 more sections

Key Result

proposition thmcounterproposition

The cross-entropy of conditional diffusion models in function space has a variational upper bound of

Figures (11)

Figure 1: $\infty$-Brush is able to controllably generate images at arbitrary resolutions of up to $4096 \times 4096$, conditioned on any available auxiliary information about the images.
Figure 2: Given a noisy function $\mathbf{u} \in \mathcal{H}$, we discretize it by randomly selecting a subset of coordinates $\mathbf{x} = \{\mathbf{x}^{(i)}\}_{1 \le i \le N} \subset \mathcal{X}$ then feed it into our conditional denoiser returning a denoised function $\mathbf{s} \in \mathcal{H}$. The conditional denoiser architecture of $\infty$-Brush includes a sparse level and a grid level. The sparse level (in blue) utilizes a sparse neural operator, a cross-attention neural operator, and a self-attention neural operator, focusing on capturing fine-grained details. The grid level (in pink) targets global information. We use k-NN linear interpolation to transform the sparse points to a regularly spaced grid.
Figure 3: Large images ($1024 \times 1024$) generated from our $\infty$-Brush, conditioned on the facial attribute blonde/non-blonde hair.
Figure 4: Very large ($4096 \times 4096$) and large ($1024 \times 1024$) images generated from $\infty$-Brush, and the corresponding reference real images used to generate them. Given a single embedding vector of a downsampled $256\times256$ real image, $\infty$-Brush can synthesize images of up to $4096 \times 4096$ and preserve global structures of the reference image.
Figure 5: Long-range dependencies comparison between our $\infty$-Brush and patched-based method graikos2023learned. $\infty$-Brush retains large-scale structures (such as clearly-separated clusters of cells) that can span multiple patches in comparison to the image generated from graikos2023learned.
...and 6 more figures

Theorems & Definitions (15)

proposition thmcounterproposition: Learning Objective
proof
lemma thmcounterlemma: Measure Equivalence - The Feldman-Hájek Theorem
lemma thmcounterlemma: The Radon-Nikodym Derivative
proof
theorem thmcountertheorem: Conditional Diffusion Optimality in Function Space
proof
proposition thmcounterproposition: Learning Objective
proof
lemma thmcounterlemma: Measure Equivalence - The Feldman-Hájek Theorem
...and 5 more

$\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

TL;DR

Abstract

$\infty$-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (15)