Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures

Michael Unser; Alexis Goujon; Stanislas Ducotterd

Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures

Michael Unser, Alexis Goujon, Stanislas Ducotterd

TL;DR

The paper develops a variational framework for learning freeform pointwise nonlinearities in layered architectures under slope constraints by penalizing the second-order total variation ${\rm TV}^{(2)}(f)$. It proves that the global optimum is an adaptive nonuniform linear spline and provides a discretization via nonuniform B-splines, enabling efficient training with explicit control of Lipschitz properties. By enforcing slope bounds, learned activations can be 1-Lipschitz, firmly non-expansive (proximal operators), monotone, or invertible, making them compatible with plug-and-play methods and unrolled optimization. The framework is demonstrated on function fitting, image denoising, and inverse problems (CT/MRI), with a DeepSplines toolbox released to enable practical adoption.

Abstract

We present a general variational framework for the training of freeform nonlinearities in layered computational architectures subject to some slope constraints. The regularization that we add to the traditional training loss penalizes the second-order total variation of each trainable activation. The slope constraints allow us to impose properties such as 1-Lipschitz stability, firm non-expansiveness, and monotonicity/invertibility. These properties are crucial to ensure the proper functioning of certain classes of signal-processing algorithms (e.g., plug-and-play schemes, unrolled proximal gradient, invertible flows). We prove that the global optimum of the stated constrained-optimization problem is achieved with nonlinearities that are adaptive nonuniform linear splines. We then show how to solve the resulting function-optimization problem numerically by representing the nonlinearities in a suitable (nonuniform) B-spline basis. Finally, we illustrate the use of our framework with the data-driven design of (weakly) convex regularizers for the denoising of images and the resolution of inverse problems.

Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures

TL;DR

The paper develops a variational framework for learning freeform pointwise nonlinearities in layered architectures under slope constraints by penalizing the second-order total variation

. It proves that the global optimum is an adaptive nonuniform linear spline and provides a discretization via nonuniform B-splines, enabling efficient training with explicit control of Lipschitz properties. By enforcing slope bounds, learned activations can be 1-Lipschitz, firmly non-expansive (proximal operators), monotone, or invertible, making them compatible with plug-and-play methods and unrolled optimization. The framework is demonstrated on function fitting, image denoising, and inverse problems (CT/MRI), with a DeepSplines toolbox released to enable practical adoption.

Abstract

Paper Structure (16 sections, 6 theorems, 56 equations, 6 figures, 1 table)

This paper contains 16 sections, 6 theorems, 56 equations, 6 figures, 1 table.

Introduction
Mathematical Preliminaries
Continuity Bounds
Canonical Interpolation of an Ordered Set of Points
Representer Theorem for Constrained ${\rm TV}^{(2)}$ Minimization
Scalar Potentials Related to Linear Splines
Scalar Potential Specified Through its Derivative
Scalar Potential Specified Through its Proximity Operator
Algorithmic framework for the Learning of Freeform Activations
Learning Activations in Deep Neural Architectures
Spline Parameterization and Training
Function-Fitting Experiments
Learned Potentials for Image Reconstruction
Learned Gradients
Learned Proximal Operators
...and 1 more sections

Key Result

Theorem 1

Any function with finite second-order total variation is Lipschitz-continuous with its Lipschitz constant being bounded by where Moreover, Eq:LipInequal2 is saturated if and only if $f$ is monotone-convex or monotone-concave.

Figures (6)

Figure 1: Canonical spline interpolants for two sets $\mathbb{P}_1$ and $\mathbb{P}_2$ of points represented as small circles in the plane. The filled circles are the spline knots (breakpoints), while the empty ones are the boundary points used for linear extrapolation. The two splines are linked because they are induced by a common (learnable) convex potential $\phi$ with $f_1=\phi'$ and $f_2={\rm prox}_{\phi}$. (See detailed explanations Section 4.)
Figure 2: Interpolating basis functions associated with the grid points ${\boldsymbol{t}}=(-2,-1,1,4,5,9,9.5)$. The locations of the spline knots are marked by crosses.
Figure 3: Evolution of the loss during the training procedure, with an epoch corresponding to 1000 steps of SGD.
Figure 4: Performance summary of variational denoising with trainable analysis filters as a function of $\rho$ (modulus of weak convexity).
Figure 5: Learned potential $\phi$ and its derivative $\psi$.
...and 1 more figures

Theorems & Definitions (16)

Theorem 1: Aziznejad2022
Definition 1: Canonical interpolator
Theorem 2
proof
Proposition 1
proof : Sketch of Proof
Definition 2
Proposition 2: Spline derivative of a (weakly) convex potential
proof
Proposition 3: Spline prox of a (weakly) convex potential
...and 6 more

Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures

TL;DR

Abstract

Controlled Learning of Pointwise Nonlinearities in Neural-Network-Like Architectures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (16)