Linear Diffusion Networks
Jacob Fein-Ashley
TL;DR
LDN tackles the bottleneck of sequential modeling by reframing temporal information sharing as a diffusion process. It integrates a PDE-inspired primary diffusion kernel $K$, a local update $F$, and a diffusion-based attention kernel $A_{\text{diff}}$, with an adaptive time step $\delta t$, to enable stable, parallelizable, multi-scale sequence processing. The approach provides global interactions with rigorous row-sum-zero constraints, yielding robust training and strong empirical results on ImageNet and Long Range Arena, often with fewer parameters and FLOPs than competitive transformers. This diffusion-centric framework bridges efficient computation and expressive representation learning, offering a versatile path for sequential modeling in both vision and language domains.
Abstract
We present Linear Diffusion Networks (LDNs), a novel architecture that reinterprets sequential data processing as a unified diffusion process. Our model integrates adaptive diffusion modules with localized nonlinear updates and a diffusion-inspired attention mechanism. This design enables efficient global information propagation while preserving fine-grained temporal details. LDN overcomes the limitations of conventional recurrent and transformer models by allowing full parallelization across time steps and supporting robust multi-scale temporal representations. Experiments on benchmark sequence modeling tasks demonstrate that LDN delivers competitive performance across ImageNet and LRA tasks.
