Table of Contents
Fetching ...

Double-well Net for Image Segmentation

Hao Liu, Jun Liu, Raymond H. Chan, Xue-Cheng Tai

TL;DR

The paper addresses image segmentation by marrying the Potts model with deep neural networks, introducing two operator-splitting based networks (DN-I and DN-II) that learn the region force via a UNet and solve a Potts functional through an extended MBO scheme. Model I uses a data-driven region force $F(f)$ with learnable linear controls, while Model II generalizes to a nonlinear operator $G(u,f,\mathbf{x},t)$ represented by a UNet, with time stepping viewed as neural-network blocks. The authors demonstrate superior accuracy and Dice scores on MARA10K, ECSSD, and RITE datasets, using far fewer parameters than comparison models, and show that DN-I can reveal and progressively learn the effective region force. This work provides a principled integration of variational models with neural networks, yielding a scalable, multiscale segmentation framework inspired by operator-splitting and the MBO scheme, with practical impact in robust, data-driven segmentation tasks.

Abstract

In this study, our goal is to integrate classical mathematical models with deep neural networks by introducing two novel deep neural network models for image segmentation known as Double-well Nets. Drawing inspirations from the Potts model, our models leverage neural networks to represent a region force functional. We extend the well-know MBO (Merriman-Bence-Osher) scheme to solve the Potts model. The widely recognized Potts model is approximated using a double-well potential and then solved by an operator-splitting method, which turns out to be an extension of the well-known MBO scheme. Subsequently, we replace the region force functional in the Potts model with a UNet-type network, which is data-driven and is designed to capture multiscale features of images, and also introduce control variables to enhance effectiveness. The resulting algorithm is a neural network activated by a function that minimizes the double-well potential. What sets our proposed Double-well Nets apart from many existing deep learning methods for image segmentation is their strong mathematical foundation. They are derived from the network approximation theory and employ the MBO scheme to approximately solve the Potts model. By incorporating mathematical principles, Double-well Nets bridge the MBO scheme and neural networks, and offer an alternative perspective for designing networks with mathematical backgrounds. Through comprehensive experiments, we demonstrate the performance of Double-well Nets, showcasing their superior accuracy and robustness compared to state-of-the-art neural networks. Overall, our work represents a valuable contribution to the field of image segmentation by combining the strengths of classical variational models and deep neural networks. The Double-well Nets introduce an innovative approach that leverages mathematical foundations to enhance segmentation performance.

Double-well Net for Image Segmentation

TL;DR

The paper addresses image segmentation by marrying the Potts model with deep neural networks, introducing two operator-splitting based networks (DN-I and DN-II) that learn the region force via a UNet and solve a Potts functional through an extended MBO scheme. Model I uses a data-driven region force with learnable linear controls, while Model II generalizes to a nonlinear operator represented by a UNet, with time stepping viewed as neural-network blocks. The authors demonstrate superior accuracy and Dice scores on MARA10K, ECSSD, and RITE datasets, using far fewer parameters than comparison models, and show that DN-I can reveal and progressively learn the effective region force. This work provides a principled integration of variational models with neural networks, yielding a scalable, multiscale segmentation framework inspired by operator-splitting and the MBO scheme, with practical impact in robust, data-driven segmentation tasks.

Abstract

In this study, our goal is to integrate classical mathematical models with deep neural networks by introducing two novel deep neural network models for image segmentation known as Double-well Nets. Drawing inspirations from the Potts model, our models leverage neural networks to represent a region force functional. We extend the well-know MBO (Merriman-Bence-Osher) scheme to solve the Potts model. The widely recognized Potts model is approximated using a double-well potential and then solved by an operator-splitting method, which turns out to be an extension of the well-known MBO scheme. Subsequently, we replace the region force functional in the Potts model with a UNet-type network, which is data-driven and is designed to capture multiscale features of images, and also introduce control variables to enhance effectiveness. The resulting algorithm is a neural network activated by a function that minimizes the double-well potential. What sets our proposed Double-well Nets apart from many existing deep learning methods for image segmentation is their strong mathematical foundation. They are derived from the network approximation theory and employ the MBO scheme to approximately solve the Potts model. By incorporating mathematical principles, Double-well Nets bridge the MBO scheme and neural networks, and offer an alternative perspective for designing networks with mathematical backgrounds. Through comprehensive experiments, we demonstrate the performance of Double-well Nets, showcasing their superior accuracy and robustness compared to state-of-the-art neural networks. Overall, our work represents a valuable contribution to the field of image segmentation by combining the strengths of classical variational models and deep neural networks. The Double-well Nets introduce an innovative approach that leverages mathematical foundations to enhance segmentation performance.
Paper Structure (18 sections, 41 equations, 18 figures, 2 tables)

This paper contains 18 sections, 41 equations, 18 figures, 2 tables.

Figures (18)

  • Figure 1: Illustration of UNet type network with input of size $N_1\times N_2\times D$. The left branch is the encoding part, the right branch is the decoding part, and the bottom rectangle denotes the bottleneck. $S$ denotes the number of resolution scales in the encoding part and decoding part. $c_k$ denotes the number of channels at resolution scale $k$. Wide arrows with gradient shadow represent downsampling operations. Wide arrows without gradient shadow represent upsampling operations. Horizontal dashed arrows represent skip connections. The orange rectangles denote the outputs of the encoding part that are passed to the decoding part via the skip connections. The length and width of the rectangle represent the output resolution and number of channels, respectively.
  • Figure 2: For model (\ref{['eq.control']}): (a) An illustration of a double-well block I (DB-I) with activation $Q_{\gamma}\circ \mathrm{Sig}$. (b) An illustration of the double-well net I (DN-I) with activation $Q_{\gamma}\circ \mathrm{Sig}$. The architecture in (a) is the detailed representation of the block $B_{\rm I}^k$'s in (b). In both figures, the disk represents input, output and intermediate variables. The diamond represents the functional $F$. In (a), rectangles represent operations applied in a DB-I. In (b), sketched rectangles represent operations in the input layer and final layer, normal rectangles represent DB-I's. To better present the architecture, some scalar factors are omitted.
  • Figure 3: For model (\ref{['eq.control2']}): (a) An illustration of a double-well block II (DB-II) with activation $Q_{\gamma}\circ \mathrm{Sig}$. (b) An illustration of the double-well net II (DN-II) with activation $Q_{\gamma}\circ \mathrm{Sig}$. The architecture in (a) is the detailed representation of the block $B_{\rm II}^k$'s in (b). In both figures, the disk represents input, output and intermediate variables. In (a), rectangles represent operations applied in a DB-II. In (b), sketched rectangles represent operations in the input layer and final layer, normal rectangles represent DB-II's. To better present the architecture, some scalar factors are omitted.
  • Figure 4: Comparison of the histories of (a) training loss, (b) accuracy and (c) dice score of DN-I and DN-II with UNet, UNet++ (UN++), MANet (MAN), and DeepLabV3+ (DLV3+) on MARA10K.
  • Figure 5: Segmentation examples in the comparison of DN-I and DN-II with UNet, UNet++, MANet, and DeepLabV3+ on MARA10K.
  • ...and 13 more figures

Theorems & Definitions (5)

  • Remark 3.1
  • Remark 3.2
  • Remark 3.3
  • Remark 3.4
  • Remark 3.5