Regularization of Inverse Problems: Deep Equilibrium Models versus Bilevel Learning

Danilo Riccio; Matthias J. Ehrhardt; Martin Benning

Regularization of Inverse Problems: Deep Equilibrium Models versus Bilevel Learning

Danilo Riccio, Matthias J. Ehrhardt, Martin Benning

TL;DR

It is shown that computing the lower-level optimization problem within the bilevel formulation with a fixed point iteration is a special case of the deep equilibrium framework.

Abstract

Variational regularization methods are commonly used to approximate solutions of inverse problems. In recent years, model-based variational regularization methods have often been replaced with data-driven ones such as the fields-of-expert model (Roth and Black, 2009). Training the parameters of such data-driven methods can be formulated as a bilevel optimization problem. In this paper, we compare the framework of bilevel learning for the training of data-driven variational regularization models with the novel framework of deep equilibrium models (Bai, Kolter, and Koltun, 2019) that has recently been introduced in the context of inverse problems (Gilton, Ongie, and Willett, 2021). We show that computing the lower-level optimization problem within the bilevel formulation with a fixed point iteration is a special case of the deep equilibrium framework. We compare both approaches computationally, with a variety of numerical examples for the inverse problems of denoising, inpainting and deconvolution.

Regularization of Inverse Problems: Deep Equilibrium Models versus Bilevel Learning

TL;DR

It is shown that computing the lower-level optimization problem within the bilevel formulation with a fixed point iteration is a special case of the deep equilibrium framework.

Abstract

Paper Structure (27 sections, 43 equations, 17 figures)

This paper contains 27 sections, 43 equations, 17 figures.

Introduction
Deep equilibrium models and bilevel learning
Inverse problems
Deep equilibrium models
Bilevel learning
Bilevel learning as a deep equilibrium model
Why naïvely learning fixed points does not work
Architecture design and implementation
Deep equilibrium models
Deep equilibrium gradient descent
Bilevel learning models
Inverse problems
Denoising
Inpainting
Deblurring
...and 12 more sections

Figures (17)

Figure 1: Comparison between bilevel optimization and deep equilibrium models for each of the three considered inverse problems, namely denoising, inpainting, and deblurring, over all the range of possible parameters. These boxplots consider the loss of the trained models evaluated on the test dataset. We removed all the configurations with a final loss larger than $0.5$, a value we arbitrarily chose by looking for an empirical relation between the loss and the image quality.
Figure 2: Denoising the MNIST dataset. Visual comparison between bilevel method (left) and deep equilibrium model (right), with parameters $\tau=0.5$, $\gamma=0.1$, and $\sigma=\text{(ReLU)}$. Images are taken from the test dataset. The first row shows the original images; the second row is the model input. The last row is the output of the trained models.
Figure 3: Inpainting MNIST. Comparison between bilevel method (left) and deep equilibrium model (right), with parameters $\tau=0.5$, $\gamma=1.0$, and $\sigma=\text{(Softshrink)}$. Images are taken from the test dataset. The first row shows the original image, the second row is the masked image, i.e., the input of the algorithm. The fourth row is the output of the trained models. Finally, the third row shows what happens when we apply the inpainting operator on the output. The fourth row is the output of the trained deep equilibrium optimization problem. Ideally, the difference between the second and third row should be small.
Figure 4: Deblurring MNIST. Comparison between bilevel method (left) and deep equilibrium model (right), with parameters $\tau=0.5$, $\gamma=0.5$, and $\sigma=\text{(Softshrink)}$. Images are taken from the test dataset. The first row shows the original images; the second row is the model input. The last row is the output of the trained models. The third row shows the model output after we apply the convolution kernel to it. Ideally, the difference between the second and the third rows should be small.
Figure 5: Comparison of the loss error for the test dataset evaluated after each training epoch, for increasing values of noise levels in training (noise levels from top to bottom row: $0, 0.05, 0.1, 0.5, 1$). Simulations are grouped by the tasks, namely denoising, inpainting, and deblurring (left, center, right columns). Each plot shows the simulation with the configurations that achieve the lowest final test loss.
...and 12 more figures

Regularization of Inverse Problems: Deep Equilibrium Models versus Bilevel Learning

TL;DR

Abstract

Regularization of Inverse Problems: Deep Equilibrium Models versus Bilevel Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (17)