A Mathematical Explanation of UNet
Xue-Cheng Tai, Hao Liu, Raymond H. Chan, Lingfeng Li
TL;DR
This work provides a rigorous mathematical framing of UNet by casting image segmentation as a constrained control problem and solving it with a multigrid–based hybrid operator-splitting method. The authors decompose the control variables across multiple scales and show that a single iteration of this splitting scheme yields the UNet architecture, including encoder, bottleneck, decoder, and skip connections. The main contribution is a principled interpretation that connects continuous control dynamics, multigrid discretization, and operator-splitting to the practical UNet design, offering a theoretical explanation and potential avenues for generalizing to other encoder–decoder networks. The approach demonstrates that UNet can be seen as a one-step solver for a well-posed control problem, highlighting the algorithmic core shared by many neural architectures used in image segmentation and related tasks.
Abstract
The UNet architecture has transformed image segmentation. UNet's versatility and accuracy have driven its widespread adoption, significantly advancing fields reliant on machine learning problems with images. In this work, we give a clear and concise mathematical explanation of UNet. We explain what is the meaning and function of each of the components of UNet. We will show that UNet is solving a control problem. We decompose the control variables using multigrid methods. Then, operator-splitting techniques is used to solve the problem, whose architecture exactly recovers the UNet architecture. Our result shows that UNet is a one-step operator-splitting algorithm for the control problem.
