Table of Contents
Fetching ...

$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States

Sam Bond-Taylor, Chris G. Willcocks

TL;DR

This work tackles the challenge of scaling diffusion models to arbitrary resolutions by modeling data as functions in an infinite-dimensional space and training on randomly subsampled coordinates. It introduces Infinity-Diff, a mollified diffusion framework with neural-operator denoisers that map between function spaces, eliminating the need for latent compression common in neural-field approaches. Key contributions include a practical finite-time diffusion process in a Hilbert space, a multi-scale neural-operator architecture with efficient sparse components, and strong empirical results on high-resolution datasets showing high quality samples with up to 8x subsampling, along with discretisation invariance and capabilities like super-resolution and inpainting. The approach yields substantial run-time and memory savings, competitive or superior FID scores, and scalable sampling beyond training resolution, offering a viable path for high-resolution generative modeling without fixed grids or heavy latent compression.

Abstract

This paper introduces $\infty$-Diff, a generative diffusion model defined in an infinite-dimensional Hilbert space, which can model infinite resolution data. By training on randomly sampled subsets of coordinates and denoising content only at those locations, we learn a continuous function for arbitrary resolution sampling. Unlike prior neural field-based infinite-dimensional models, which use point-wise functions requiring latent compression, our method employs non-local integral operators to map between Hilbert spaces, allowing spatial context aggregation. This is achieved with an efficient multi-scale function-space architecture that operates directly on raw sparse coordinates, coupled with a mollified diffusion process that smooths out irregularities. Through experiments on high-resolution datasets, we found that even at an $8\times$ subsampling rate, our model retains high-quality diffusion. This leads to significant run-time and memory savings, delivers samples with lower FID scores, and scales beyond the training resolution while retaining detail.

$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States

TL;DR

This work tackles the challenge of scaling diffusion models to arbitrary resolutions by modeling data as functions in an infinite-dimensional space and training on randomly subsampled coordinates. It introduces Infinity-Diff, a mollified diffusion framework with neural-operator denoisers that map between function spaces, eliminating the need for latent compression common in neural-field approaches. Key contributions include a practical finite-time diffusion process in a Hilbert space, a multi-scale neural-operator architecture with efficient sparse components, and strong empirical results on high-resolution datasets showing high quality samples with up to 8x subsampling, along with discretisation invariance and capabilities like super-resolution and inpainting. The approach yields substantial run-time and memory savings, competitive or superior FID scores, and scalable sampling beyond training resolution, offering a viable path for high-resolution generative modeling without fixed grids or heavy latent compression.

Abstract

This paper introduces -Diff, a generative diffusion model defined in an infinite-dimensional Hilbert space, which can model infinite resolution data. By training on randomly sampled subsets of coordinates and denoising content only at those locations, we learn a continuous function for arbitrary resolution sampling. Unlike prior neural field-based infinite-dimensional models, which use point-wise functions requiring latent compression, our method employs non-local integral operators to map between Hilbert spaces, allowing spatial context aggregation. This is achieved with an efficient multi-scale function-space architecture that operates directly on raw sparse coordinates, coupled with a mollified diffusion process that smooths out irregularities. Through experiments on high-resolution datasets, we found that even at an subsampling rate, our model retains high-quality diffusion. This leads to significant run-time and memory savings, delivers samples with lower FID scores, and scales beyond the training resolution while retaining detail.
Paper Structure (27 sections, 27 equations, 15 figures, 5 tables)

This paper contains 27 sections, 27 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: We define a diffusion process in an infinite dimensional image space by randomly sampling coordinates and training a model parameterised by neural operators to denoise at those coordinates.
  • Figure 2: Modelling data as functions allows sampling at arbitrary resolutions using the same model with different sized noise. Left to right: $64\!\times\!64$, $128\!\times\!128$, $256\!\times\!256$ (original), $512\!\times\!512$, $1024\!\times\!1024$.
  • Figure 3: Example diffusion processes. Mollified diffusion smooths diffusion states allowing the space to be more effectively modelled with continuous operators.
  • Figure 4: $\infty$-Diff uses a hierarchical architecture that operates on irregularly sampled functions at the top level to efficiently capture fine details, and on fixed grids at the other levels to capture global structure. This approach allows scaling to intricate high-resolution data.
  • Figure 5: Samples from $\infty$-Diff models trained on sets of randomly subsampled coordinates.
  • ...and 10 more figures