A Unified Framework for U-Net Design and Analysis
Christopher Williams, Fabian Falck, George Deligiannidis, Chris Holmes, Arnaud Doucet, Saifuddin Syed
TL;DR
The paper introduces a rigorous, unified framework for designing and analyzing U-Nets via subspace preconditioning, formalizing encoder/decoder roles, projections, and a multi-resolution bottleneck. It proves a convergence link between finite-resolution U-Nets and their infinite-resolution limit, and shows U-Nets are conjugate to ResNets through preconditioning, enabling a high-resolution scaling perspective. The authors instantiate generalized architectures like Multi-ResNets (encoder-free Haar wavelet encoders), and demonstrate how to enforce boundary conditions or geometries (triangular domains, PDE constraints) directly in the architecture. Through experiments on PDE surrogate modelling, image segmentation, and diffusion-model tasks, they show competitive performance and reveal when fixed encoders are advantageous versus when learned encoders help, while also highlighting staged training for multi-resolution diffusion and topological-structure encoding. The framework paves the way for problem-informed, scalable U-Nets applicable to complex geometries and distributions beyond square domains, with implications for diffusion models and PDE-solving neural surrogates.
Abstract
U-Nets are a go-to, state-of-the-art neural architecture across numerous tasks for continuous signals on a square such as images and Partial Differential Equations (PDE), however their design and architecture is understudied. In this paper, we provide a framework for designing and analysing general U-Net architectures. We present theoretical results which characterise the role of the encoder and decoder in a U-Net, their high-resolution scaling limits and their conjugacy to ResNets via preconditioning. We propose Multi-ResNets, U-Nets with a simplified, wavelet-based encoder without learnable parameters. Further, we show how to design novel U-Net architectures which encode function constraints, natural bases, or the geometry of the data. In diffusion models, our framework enables us to identify that high-frequency information is dominated by noise exponentially faster, and show how U-Nets with average pooling exploit this. In our experiments, we demonstrate how Multi-ResNets achieve competitive and often superior performance compared to classical U-Nets in image segmentation, PDE surrogate modelling, and generative modelling with diffusion models. Our U-Net framework paves the way to study the theoretical properties of U-Nets and design natural, scalable neural architectures for a multitude of problems beyond the square.
