A Mathematical Explanation of UNet

Xue-Cheng Tai; Hao Liu; Raymond H. Chan; Lingfeng Li

A Mathematical Explanation of UNet

Xue-Cheng Tai, Hao Liu, Raymond H. Chan, Lingfeng Li

TL;DR

This work provides a rigorous mathematical framing of UNet by casting image segmentation as a constrained control problem and solving it with a multigrid–based hybrid operator-splitting method. The authors decompose the control variables across multiple scales and show that a single iteration of this splitting scheme yields the UNet architecture, including encoder, bottleneck, decoder, and skip connections. The main contribution is a principled interpretation that connects continuous control dynamics, multigrid discretization, and operator-splitting to the practical UNet design, offering a theoretical explanation and potential avenues for generalizing to other encoder–decoder networks. The approach demonstrates that UNet can be seen as a one-step solver for a well-posed control problem, highlighting the algorithmic core shared by many neural architectures used in image segmentation and related tasks.

Abstract

The UNet architecture has transformed image segmentation. UNet's versatility and accuracy have driven its widespread adoption, significantly advancing fields reliant on machine learning problems with images. In this work, we give a clear and concise mathematical explanation of UNet. We explain what is the meaning and function of each of the components of UNet. We will show that UNet is solving a control problem. We decompose the control variables using multigrid methods. Then, operator-splitting techniques is used to solve the problem, whose architecture exactly recovers the UNet architecture. Our result shows that UNet is a one-step operator-splitting algorithm for the control problem.

A Mathematical Explanation of UNet

TL;DR

Abstract

Paper Structure (16 sections, 1 theorem, 35 equations, 3 figures, 1 table, 2 algorithms)

This paper contains 16 sections, 1 theorem, 35 equations, 3 figures, 1 table, 2 algorithms.

Introduction
Proposed formulation
The control problem
Hybrid splitting methods
Multigrid discretizations
The proposed algorithm
Decomposition of control variables $\theta_1$
Algorithm details
On the choices of $S,\widetilde{S}$
On the solution to (\ref{['eq.full.v']}), (\ref{['eq.full.u']}) and (\ref{['eq.full.final']})
Initial condition
Discretization
Algorithm \ref{['alg.V.full']} recovers UNet
Algorithm \ref{['alg.V.full']} building blocks recover UNet layers
Algorithm \ref{['alg.V.full']} structure recovers UNet architecture
...and 1 more sections

Key Result

Theorem 1

For a fixed $T>0$ and a positive integer $N$, set $\Delta t=T/N$. Let $u^{n+1}$ be the numerical solution by Algorithm alg.hybrid. Assume $A_{k,s}^m$'s and $S_k^m$'s are Lipschitz with respect to $t,\mathbf{x}$, and are linear symmetric positive definite operators with respect to $u$. Assume $\Delta for any $0\leq n\leq N$.

Figures (3)

Figure 1: An illustration of Algorithm \ref{['alg.hybrid']}.
Figure 2: An illustration of a V-cycle of the multigrid method.
Figure 3: An illustration of Algorithm \ref{['alg.V.full']}.

Theorems & Definitions (1)

Theorem 1: Theorem D.1 in tai2024pottsmgnet

A Mathematical Explanation of UNet

TL;DR

Abstract

A Mathematical Explanation of UNet

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (1)