Frequency-Time Diffusion with Neural Cellular Automata

John Kalkhof; Arlene Kühn; Yannik Frisch; Anirban Mukhopadhyay

Frequency-Time Diffusion with Neural Cellular Automata

John Kalkhof, Arlene Kühn, Yannik Frisch, Anirban Mukhopadhyay

TL;DR

The paper tackles the high parameter and hardware demands of traditional UNet-based diffusion models while enabling diffusion at arbitrary image sizes. It introduces Diff-NCA, a parameter-efficient, locally communicating diffusion engine, and FourierDiff-NCA, which adds Fourier-domain global communication to capture global structure early in the diffusion process. Empirical results show FourierDiff-NCA achieving competitive or superior FID/KID scores with far fewer parameters than UNet baselines, and Diff-NCA enabling seamless, scalable pathology image generation. The approaches also demonstrate versatility in tasks such as super-resolution, inpainting, and out-of-distribution synthesis, highlighting potential for democratizing high-quality generative modeling on limited hardware.

Abstract

Despite considerable success, large Denoising Diffusion Models (DDMs) with UNet backbone pose practical challenges, particularly on limited hardware and in processing gigapixel images. To address these limitations, we introduce two Neural Cellular Automata (NCA)-based DDMs: Diff-NCA and FourierDiff-NCA. Capitalizing on the local communication capabilities of NCA, Diff-NCA significantly reduces the parameter counts of NCA-based DDMs. Integrating Fourier-based diffusion enables global communication early in the diffusion process. This feature is particularly valuable in synthesizing complex images with important global features, such as the CelebA dataset. We demonstrate that even a 331k parameter Diff-NCA can generate 512x512 pathology slices, while FourierDiff-NCA (1.1m parameters) reaches a three times lower FID score of 43.86, compared to the four times bigger UNet (3.94m parameters) with a score of 128.2. Additionally, FourierDiff-NCA can perform diverse tasks such as super-resolution, out-of-distribution image synthesis, and inpainting without explicit training.

Frequency-Time Diffusion with Neural Cellular Automata

TL;DR

Abstract

Paper Structure (33 sections, 13 figures, 3 tables)

This paper contains 33 sections, 13 figures, 3 tables.

Introduction
Related Work
Neural Cellular Automata (NCA)
NCA Image Generation
Denoising Diffusion Models
Methodology
Fourier Space: Single-Step Global Communication
FourierDiff-NCA Architecture
Model Architecture
Experimental Results
Data and Infrastructure
Metrics
Qualitative Comparison: Image Synthesis
Quantitative Comparison: Image Synthesis
Ablation
...and 18 more sections

Figures (13)

Figure 1: Diff-NCA is parameter efficient while being able to generate infinite seamless images. The one-cell model size allows FourierDiff-NCA to be applied to inputs, different from the training size, thus generating images of different shapes and scales. This same architecture also allows it to efficiently regenerate parts of an image in an inpainting task and perform superresolution on an existing image, without the need for retraining.
Figure 2: Diff-NCA predicts the noise using iterative local communication of NCA's, whereas FourierDiff-NCA additionally utilizes the Fourier space to communicate global knowledge across the image space.
Figure 3: Qualitative comparison between FourierDiff-NCA (1.85m), VNCA, and DDM based on UNet M. With a parameter count of 1.85m, 9.73m, and 3.94m, respectively.
Figure 4: Influence of parameter count to image generation performance for FourierDiff-NCA, UNet, and VNCA (the detailed numbers can be found in the appendix in Table \ref{['tab:quant']}). Green quadrant: low parameters, high performance; red quadrant: high parameters, low performance; yellow quadrants: tradeoff.
Figure 5: Out-Of-Distribution image synthesis with FourierDiff-NCA (1.85m) of different scales and shapes.
...and 8 more figures

Frequency-Time Diffusion with Neural Cellular Automata

TL;DR

Abstract

Frequency-Time Diffusion with Neural Cellular Automata

Authors

TL;DR

Abstract

Table of Contents

Figures (13)