PDE-SSM: A Spectral State Space Approach to Spatial Mixing in Diffusion Transformers

Eshed Gal; Moshe Eliasof; Siddharth Rout; Eldad Haber

PDE-SSM: A Spectral State Space Approach to Spatial Mixing in Diffusion Transformers

Eshed Gal, Moshe Eliasof, Siddharth Rout, Eldad Haber

Abstract

The success of vision transformers-especially for generative modeling-is limited by the quadratic cost and weak spatial inductive bias of self-attention. We propose PDE-SSM, a spatial state-space block that replaces attention with a learnable convection-diffusion-reaction partial differential equation. This operator encodes a strong spatial prior by modeling information flow via physically grounded dynamics rather than all-to-all token interactions. Solving the PDE in the Fourier domain yields global coupling with near-linear complexity of $O(N \log N)$, delivering a principled and scalable alternative to attention. We integrate PDE-SSM into a flow-matching generative model to obtain the PDE-based Diffusion Transformer PDE-SSM-DiT. Empirically, PDE-SSM-DiT matches or exceeds the performance of state-of-the-art Diffusion Transformers while substantially reducing compute. Our results show that, analogous to 1D settings where SSMs supplant attention, multi-dimensional PDE operators provide an efficient, inductive-bias-rich foundation for next-generation vision models.

PDE-SSM: A Spectral State Space Approach to Spatial Mixing in Diffusion Transformers

Abstract

, delivering a principled and scalable alternative to attention. We integrate PDE-SSM into a flow-matching generative model to obtain the PDE-based Diffusion Transformer PDE-SSM-DiT. Empirically, PDE-SSM-DiT matches or exceeds the performance of state-of-the-art Diffusion Transformers while substantially reducing compute. Our results show that, analogous to 1D settings where SSMs supplant attention, multi-dimensional PDE operators provide an efficient, inductive-bias-rich foundation for next-generation vision models.

Paper Structure (39 sections, 13 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 39 sections, 13 equations, 8 figures, 10 tables, 1 algorithm.

Introduction
Our approach.
Our contributions are as follows:
From 1D SSM to a Spatial PDE-SSM
A Differential Operator View of SSMs
Generalizing to Space: The PDE-SSM Formulation
The Embedding Operator $\mathcal{B_{\gamma}}$.
The PDE Evolution and its Green's Function $\mathcal{G}_{\zeta}$.
Theoretical Properties of PDE-SSM.
Efficient Implementation with Multi-Channel Coupling
Computational Complexity.
Using PDE-SSM within a Diffusion Transformer
Background on Flow-Matching.
Patch size and complexity.
Experiments
...and 24 more sections

Figures (8)

Figure 1: Visualizing the PDE-SSM Convolutional Kernels. By sampling the learnable parameters $\xi = (\mathcal{B}_{\gamma}, \zeta)$, our PDE-SSM can represent a diverse family of convolutional kernels. The examples show kernels that are (from left to right): localized, directionally blurred (anisotropic diffusion), shifted (convection), and a combination of effects. This flexibility allows our PDE-SSM model to learn a rich basis for spatial feature mixing, including non-local connections.
Figure 1: PDE-SSM Forward Pass
Figure 2: CIFAR-10 Images: (a) real images; (b) DiT; (c) PDE-SSM-DiT. Visual quality is comparable, in congruence with Table \ref{['tab:cifar10']}.
Figure 3: ImageNet$64$ training. (a) All methods converge at a similar rate and to an FID score that is similar. (b) The achieved FID score is consistent with the internal FID score.
Figure 4: LSUN-Churches generations: (a) real images; (b) DiT; (c) PDE-SSM-DiT.
...and 3 more figures

PDE-SSM: A Spectral State Space Approach to Spatial Mixing in Diffusion Transformers

Abstract

PDE-SSM: A Spectral State Space Approach to Spatial Mixing in Diffusion Transformers

Authors

Abstract

Table of Contents

Figures (8)