Table of Contents
Fetching ...

On Optimal Regularization Parameters via Bilevel Learning

Matthias J. Ehrhardt, Silvia Gazzola, Sebastian J. Scott

TL;DR

This work investigates bilevel learning for optimal regularization parameter selection in variational regularization of linear inverse problems. It derives a new, sufficient positivity condition based on Bregman distances that guarantees the optimal parameter $\alpha$ is strictly positive when the forward operator is injective and the regularizer is convex, bounded below, and continuously differentiable; the condition uses $B=(A^TA)^{-1}$ and the LS solution $x^0$. The authors extend the analysis to an expected predictive-risk upper level with invertible $A$ and provide pointwise (denoising) corollaries, including a guaranteed positivity result under zero-mean noise when the regularizer is strictly convex. Numerical experiments in low- and high-dimensional settings validate the theory and show that the new condition offers a sharper characterization of positivity than existing criteria, with practical implications for regularizer design and parameter tuning in imaging tasks. Overall, the results deepen the theoretical understanding of bilevel learning as a regularization-parameter selection strategy and demonstrate its robustness across denoising and deconvolution applications.

Abstract

Variational regularization is commonly used to solve linear inverse problems, and involves augmenting a data fidelity by a regularizer. The regularizer is used to promote a priori information and is weighted by a regularization parameter. Selection of an appropriate regularization parameter is critical, with various choices leading to very different reconstructions. Classical strategies used to determine a suitable parameter value include the discrepancy principle and the L-curve criterion, and in recent years a supervised machine learning approach called bilevel learning has been employed. Bilevel learning is a powerful framework to determine optimal parameters and involves solving a nested optimization problem. While previous strategies enjoy various theoretical results, the well-posedness of bilevel learning in this setting is still an open question. In particular, a necessary property is positivity of the determined regularization parameter. In this work, we provide a new condition that better characterizes positivity of optimal regularization parameters than the existing theory. Numerical results verify and explore this new condition for both small and high-dimensional problems.

On Optimal Regularization Parameters via Bilevel Learning

TL;DR

This work investigates bilevel learning for optimal regularization parameter selection in variational regularization of linear inverse problems. It derives a new, sufficient positivity condition based on Bregman distances that guarantees the optimal parameter is strictly positive when the forward operator is injective and the regularizer is convex, bounded below, and continuously differentiable; the condition uses and the LS solution . The authors extend the analysis to an expected predictive-risk upper level with invertible and provide pointwise (denoising) corollaries, including a guaranteed positivity result under zero-mean noise when the regularizer is strictly convex. Numerical experiments in low- and high-dimensional settings validate the theory and show that the new condition offers a sharper characterization of positivity than existing criteria, with practical implications for regularizer design and parameter tuning in imaging tasks. Overall, the results deepen the theoretical understanding of bilevel learning as a regularization-parameter selection strategy and demonstrate its robustness across denoising and deconvolution applications.

Abstract

Variational regularization is commonly used to solve linear inverse problems, and involves augmenting a data fidelity by a regularizer. The regularizer is used to promote a priori information and is weighted by a regularization parameter. Selection of an appropriate regularization parameter is critical, with various choices leading to very different reconstructions. Classical strategies used to determine a suitable parameter value include the discrepancy principle and the L-curve criterion, and in recent years a supervised machine learning approach called bilevel learning has been employed. Bilevel learning is a powerful framework to determine optimal parameters and involves solving a nested optimization problem. While previous strategies enjoy various theoretical results, the well-posedness of bilevel learning in this setting is still an open question. In particular, a necessary property is positivity of the determined regularization parameter. In this work, we provide a new condition that better characterizes positivity of optimal regularization parameters than the existing theory. Numerical results verify and explore this new condition for both small and high-dimensional problems.
Paper Structure (16 sections, 17 theorems, 93 equations, 11 figures, 2 tables)

This paper contains 16 sections, 17 theorems, 93 equations, 11 figures, 2 tables.

Key Result

Theorem 1

Let $A$ be injective and let $\mathcal{R}$ be convex, bounded below, and continuously differentiable. If and then $0$ is not a solution to eq:bilevel.

Figures (11)

  • Figure 1: The case $n=1,$$\mathcal{R}(x) = \frac{1}{2}\|*\|{x}^2.$ Plot of regularizer evaluations and Bregman distances $D_\mathcal{R}(\cdot,z)$ for different choices of $z$.
  • Figure 2: The case $n=1,$$\mathcal{R}(x) = \mathrm{hub}_\gamma(x),$$\gamma=0.01.$ Plot of regularizer evaluations and Bregman distances $D_\mathcal{R}(\cdot,z)$ for different choices of $z$.
  • Figure 3: The case $n=1,$$\mathcal{R}(x) = \frac{1}{2}\|*\|{x}^2.$ (a) shows a plot of regularizer evaluations and illustration of the new condition in the case $A=1.$ While the old condition demands that the red diamond is higher than the blue diamond, the new condition only requires that the red diamond is higher than the blue circle. Regions of $y$ for which the old and new conditions are satisfied are indicated by the striped and orange regions on the horizontal axis respectively. We see that the two conditions describe different regions and the striped region is a subset of the shaded region. (b) shows a plot of regularizer evaluations and illustration of the new condition in the case $A=\sqrt{3}.$ The new condition only demands that the red circle is higher than the blue circle.
  • Figure 4: Visualisation of Corollary \ref{['cory:pointwise']}. Plots of the reconstruction space $\Omega$ for the trivial forward operator ($A_1$) setting and various choices of regularizer, with the condition boundaries as detailed in Table \ref{['tab:numerics-colour-legend']}. The ground truth $x^\star=[1,0.5]$ is represented by a yellow star, and level sets of the upper level cost function are visible. The region where $0$ is a solution to \ref{['eq:bilevel']} is shaded yellow. The choice of regularizer is indicated in each subcaption.
  • Figure 5: Visualisation of Corollary \ref{['cory:pointwise']}. Plots of the reconstruction space $\Omega$ for the non-trivial forward operator ($A_2$) setting and various choices of regularizer, with the condition boundaries as detailed in Table \ref{['tab:numerics-colour-legend']}. The ground truth $x^\star=[1,0.5]^T$ is represented by a yellow star, and level sets of the upper level cost function are visible. The region where $0$ is a solution to \ref{['eq:bilevel']} is shaded yellow. The choice of regularizer is indicated in each subcaption.
  • ...and 6 more figures

Theorems & Definitions (45)

  • Definition 1: Bregman distance
  • Example 1
  • Example 2
  • Example 3
  • Theorem 1: Positivity of the bilevel learning problem solution
  • Corollary 1
  • Remark 1
  • Corollary 2
  • Remark 2
  • Theorem 2
  • ...and 35 more