$\infty$-Diff: Infinite Resolution Diffusion with Subsampled Mollified States
Sam Bond-Taylor, Chris G. Willcocks
TL;DR
This work tackles the challenge of scaling diffusion models to arbitrary resolutions by modeling data as functions in an infinite-dimensional space and training on randomly subsampled coordinates. It introduces Infinity-Diff, a mollified diffusion framework with neural-operator denoisers that map between function spaces, eliminating the need for latent compression common in neural-field approaches. Key contributions include a practical finite-time diffusion process in a Hilbert space, a multi-scale neural-operator architecture with efficient sparse components, and strong empirical results on high-resolution datasets showing high quality samples with up to 8x subsampling, along with discretisation invariance and capabilities like super-resolution and inpainting. The approach yields substantial run-time and memory savings, competitive or superior FID scores, and scalable sampling beyond training resolution, offering a viable path for high-resolution generative modeling without fixed grids or heavy latent compression.
Abstract
This paper introduces $\infty$-Diff, a generative diffusion model defined in an infinite-dimensional Hilbert space, which can model infinite resolution data. By training on randomly sampled subsets of coordinates and denoising content only at those locations, we learn a continuous function for arbitrary resolution sampling. Unlike prior neural field-based infinite-dimensional models, which use point-wise functions requiring latent compression, our method employs non-local integral operators to map between Hilbert spaces, allowing spatial context aggregation. This is achieved with an efficient multi-scale function-space architecture that operates directly on raw sparse coordinates, coupled with a mollified diffusion process that smooths out irregularities. Through experiments on high-resolution datasets, we found that even at an $8\times$ subsampling rate, our model retains high-quality diffusion. This leads to significant run-time and memory savings, delivers samples with lower FID scores, and scales beyond the training resolution while retaining detail.
