Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks

Ruigang Wang; Krishnamurthy Dvijotham; Ian R. Manchester

Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks

Ruigang Wang, Krishnamurthy Dvijotham, Ian R. Manchester

TL;DR

This work develops BiLipNet, a bi-Lipschitz invertible neural network built from strongly monotone, Lipschitz residual layers (F(x)=\mu x+H(x)) and orthogonal blocks, with inversion achieved via a three-operator splitting solver. It also introduces PLNet, a scalar-output network formed by pairing a BiLipNet with a quadratic potential that satisfies the Polyak-Łojasiewicz condition, ensuring a unique, efficiently computable global minimum. The core technical advances include a feed-through network (FTN) for enhanced expressivity, IQC-based certification that yields tighter bounds than spectral normalization, and a direct parameterization enabling scalable constrained design. The paper further demonstrates applications in uncertainty quantification and surrogate loss learning, showing improved predictive distance-awareness and reliable optimization for non-convex surrogate objectives. Overall, BiLipNet and PLNet offer certified, invertible, and easily optimizable neural architectures with practical impact on trustworthy ML and surrogate optimization tasks.

Abstract

This paper presents a new bi-Lipschitz invertible neural network, the BiLipNet, which has the ability to smoothly control both its Lipschitzness (output sensitivity to input perturbations) and inverse Lipschitzness (input distinguishability from different outputs). The second main contribution is a new scalar-output network, the PLNet, which is a composition of a BiLipNet and a quadratic potential. We show that PLNet satisfies the Polyak-Lojasiewicz condition and can be applied to learn non-convex surrogate losses with a unique and efficiently-computable global minimum. The central technical element in these networks is a novel invertible residual layer with certified strong monotonicity and Lipschitzness, which we compose with orthogonal layers to build the BiLipNet. The certification of these properties is based on incremental quadratic constraints, resulting in much tighter bounds than can be achieved with spectral normalization. Moreover, we formulate the calculation of the inverse of a BiLipNet -- and hence the minimum of a PLNet -- as a series of three-operator splitting problems, for which fast algorithms can be applied.

Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks

TL;DR

Abstract

Paper Structure (52 sections, 6 theorems, 59 equations, 11 figures, 3 tables)

This paper contains 52 sections, 6 theorems, 59 equations, 11 figures, 3 tables.

Introduction
Contributions
Related work
Bi-Lipschitz invertible layer.
IQC-based Lipschitz estimation and training.
Bi-Lipschitz networks for learning-based surrogate optimization.
Preliminaries
Surrogate loss learning.
Monotone and bi-Lipschitz Networks
Feed-through network
SDP conditions for monotonicity and Lipschitzness
Model parameterization
Number of free parameters.
Bi-Lipschitz networks
Partially bi-Lipschitz networks.
...and 37 more sections

Key Result

Theorem 3.2

${\mathcal{F}}$ is $\mu$-strongly monotone and $\nu$-Lipschitz if there exists a $\Lambda \in {\mathbb{D}}_+^m$, where ${\mathbb{D}}_+^m$ is the set of positive diagonal matrices, such that the following conditions hold: where $\gamma=\nu-\mu>0$.

Figures (11)

Figure 1: Fitting a step function, which is not Lipschitz, with certified $(0.1, 10)$-Lipschitz models. Compared to the analytically-computed optimum, the proposed BiLipNet achieves much tighter bounds than models based on spectral normalization.
Figure 2: This figure depicts the possible ranges of $\Delta y={\mathcal{F}}(x')-{\mathcal{F}}(x)$ on $\mathbb{R}^2$ for a given $\Delta x=x'-x$. The ring (blue area) is for $(\mu,\nu)$-Lipschitz ${\mathcal{F}}$ while the half moon (red area) is for a $\mu$-strongly monotone and $\nu$-Lipschitz ${\mathcal{F}}$. The largest angle between $\Delta x$ and $\Delta y$ satisfies $\cos\alpha=\tau^{-1}$ with $\tau=\nu/\mu$ as the distortion.
Figure 3: The proposed invertible residual network ${\mathcal{F}}(x)=\mu x+{\mathcal{H}}(x)$ where the nonlinear block ${\mathcal{H}}$ is a feed-through network, whose hidden layers are directly connected to the input and output.
Figure 4: Predictive uncertainty of different NGPs with the same bi-Lipschitz bound. The points from dark blue and regions are classified as in-domain distribution and OOD data, respectively. Light blue and orange points (different colors indicate different labels) are training samples from the two-moon dataset. The red points are ODD test examples. For the case with small distortion, our model can still distinguish the train and OOD data, achieving similar results of SNGP with large distortion.
Figure 5: Surrogate loss learning for 20-dimensional Rosenbrock function. Comparison of training and test error vs model distortion for PLNet with different bi-Lipschitz models.
...and 6 more figures

Theorems & Definitions (17)

Definition 2.1
Definition 2.2
Definition 2.3
Remark 3.1
Theorem 3.2
Remark 3.3
Definition 3.4
Proposition 3.5
Proposition 4.1
Proposition 4.2
...and 7 more

Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks

TL;DR

Abstract

Monotone, Bi-Lipschitz, and Polyak-Lojasiewicz Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (17)