Contractivity of neural ODEs: an eigenvalue optimization problem

Nicola Guglielmi; Arturo De Marinis; Anton Savostianov; Francesco Tudisco

Contractivity of neural ODEs: an eigenvalue optimization problem

Nicola Guglielmi, Arturo De Marinis, Anton Savostianov, Francesco Tudisco

TL;DR

The paper tackles stability and robustness of neural ODEs by formulating a worst-case contractivity problem via the logarithmic norm $\mu^*$, defined as $\mu^*=\max_{D\in\Omega_m}\mu_2(DA)$ with $\mu_2(B)=\lambda_{\max}(\mathrm{Sym}(B))$. It introduces a two-level algorithm: an inner gradient-system method to find extremizers $D[m]$ minimizing $F(D)=-\mu_2(DA)$ for a fixed interval $[m,1]$, and an outer Newton-bisection procedure to identify the smallest $m^*$ such that $\mu_2(D A)=0$ for all admissible $D$, extended to time-varying and multilayer networks. The approach is complemented by a combinatorial relaxation to speed up computations and by time-dependent extensions that yield a dynamic stability bound; applied to a MNIST classifier, shifts in the weight matrix ensure contractivity and improve adversarial robustness. The framework offers a principled route to design contractive neural ODEs with practical impact on robustness and reliability of continuous-depth models.

Abstract

We propose a novel methodology to solve a key eigenvalue optimization problem which arises in the contractivity analysis of neural ODEs. When looking at contractivity properties of a one layer weight-tied neural ODE $\dot{u}(t)=σ(Au(t)+b)$ (with $u,b \in {\mathbb R}^n$, $A$ is a given $n \times n$ matrix, $σ: {\mathbb R} \to {\mathbb R}$ denotes an activation function and for a vector $z \in {\mathbb R}^n$, $σ(z) \in {\mathbb R}^n$ has to be interpreted entry-wise), we are led to study the logarithmic norm of a set of products of type $D A$, where $D$ is a diagonal matrix such that ${\mathrm{diag}}(D) \in σ'({\mathbb R}^n)$. Specifically, given a real number $c$ (usually $c=0$), the problem consists in finding the largest positive interval $\text{I}\subseteq \mathbb [0,\infty)$ such that the logarithmic norm $μ(DA) \le c$ for all diagonal matrices $D$ with $D_{ii}\in \text{I}$. We propose a two-level nested methodology: an inner level where, for a given $\text{I}$, we compute an optimizer $D^\star(\text{I})$ by a gradient system approach, and an outer level where we tune $\text{I}$ so that the value $c$ is reached by $μ(D^\star(\text{I})A)$. We extend the proposed two-level approach to the general multilayer, and possibly time-dependent, case $\dot{u}(t) = σ( A_k(t) \ldots σ( A_{1}(t) u(t) + b_{1}(t) ) \ldots + b_{k}(t) )$ and we propose several numerical examples to illustrate its behaviour, including its stabilizing performance on a one-layer neural ODE applied to the classification of the MNIST handwritten digits dataset.

Contractivity of neural ODEs: an eigenvalue optimization problem

TL;DR

The paper tackles stability and robustness of neural ODEs by formulating a worst-case contractivity problem via the logarithmic norm

, defined as

with

. It introduces a two-level algorithm: an inner gradient-system method to find extremizers

minimizing

for a fixed interval

, and an outer Newton-bisection procedure to identify the smallest

such that

for all admissible

, extended to time-varying and multilayer networks. The approach is complemented by a combinatorial relaxation to speed up computations and by time-dependent extensions that yield a dynamic stability bound; applied to a MNIST classifier, shifts in the weight matrix ensure contractivity and improve adversarial robustness. The framework offers a principled route to design contractive neural ODEs with practical impact on robustness and reliability of continuous-depth models.

Abstract

(with

is a given

matrix,

denotes an activation function and for a vector

has to be interpreted entry-wise), we are led to study the logarithmic norm of a set of products of type

, where

is a diagonal matrix such that

. Specifically, given a real number

(usually

), the problem consists in finding the largest positive interval

such that the logarithmic norm

for all diagonal matrices

with

. We propose a two-level nested methodology: an inner level where, for a given

, we compute an optimizer

by a gradient system approach, and an outer level where we tune

so that the value

is reached by

. We extend the proposed two-level approach to the general multilayer, and possibly time-dependent, case

and we propose several numerical examples to illustrate its behaviour, including its stabilizing performance on a one-layer neural ODE applied to the classification of the MNIST handwritten digits dataset.

Paper Structure (29 sections, 10 theorems, 102 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 10 theorems, 102 equations, 5 figures, 4 tables, 1 algorithm.

Introduction
Contractivity
Uniform contractivity versus asymptotic stability
Overview of our contribution
Paper organization
A two-level nested iterative method for solving (P1)
Inner problem: a gradient system approach
Inequality constraints: admissible directions
Constrained gradient system
Structure of extremizers: a theoretical result
Numerical integration
Multilayer networks
Outer iteration: computing $m^\star$
The value $\mu_2(A)$ is strongly unspecific.
A theoretical upper bound for $m^\star$
...and 14 more sections

Key Result

Lemma 3.1

Kato2013 Consider a continuously differentiable matrix valued function $C(\tau) :\mathbb \mathbb R \to \mathbb \mathbb R^{n,n}$, with $C(\tau)$ symmetric. Let $\lambda(\tau)$ be a simple eigenvalue of $C(\tau)$ for all $\tau$ and let $x(\tau)$ with $\|x(\tau)\|=1$ be the associated (right and left)

Figures (5)

Figure 1: The function $\phi[m]$ for the example matrix \ref{['ex:2']}.
Figure 2: The behaviour of the $3$ entries of $x$ (left) and $z=A x$ (right) for extremizers for problem \ref{['ex:2']} as a function of $m$.
Figure 3: Behaviour of $m^\star$ as a function of $t$, for two different matrix-valued functions $A(t)$.
Figure 4: The behaviour (for fixed $m$) of $\mu_2(t)$ (see \ref{['eq:mut']}) for the matrix-valued function \ref{['ex:4']} as a function of $t$.
Figure 6: Losses (on the left) and accuracy (on the right) behaviour during training for ODEnet, stabilized stabODEnet, SODEF, and asymODEnet.

Theorems & Definitions (24)

Definition 2.1
Lemma 3.1
Lemma 3.2
proof
Theorem 3.1: necessary condition for an extremizer to assume intermediate values in $(m,1)$
proof
Lemma 3.3
proof
Lemma 3.4
Lemma 3.5
...and 14 more

Contractivity of neural ODEs: an eigenvalue optimization problem

TL;DR

Abstract

Contractivity of neural ODEs: an eigenvalue optimization problem

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (24)