Table of Contents
Fetching ...

From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks

Cristina Cipriani, Massimo Fornasier, Alessandro Scagliotti

TL;DR

This paper proposes a continuous-time Autoencoder, which is inspired by the theoretical findings, and develops a training method tailored to this specific type of Autoencoders with residual connections, which enables the extension of the mean-field control framework originally devised for conventional NeurODEs.

Abstract

The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modeling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularization, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularization may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.

From NeurODEs to AutoencODEs: a mean-field control framework for width-varying Neural Networks

TL;DR

This paper proposes a continuous-time Autoencoder, which is inspired by the theoretical findings, and develops a training method tailored to this specific type of Autoencoders with residual connections, which enables the extension of the mean-field control framework originally devised for conventional NeurODEs.

Abstract

The connection between Residual Neural Networks (ResNets) and continuous-time control systems (known as NeurODEs) has led to a mathematical analysis of neural networks which has provided interesting results of both theoretical and practical significance. However, by construction, NeurODEs have been limited to describing constant-width layers, making them unsuitable for modeling deep learning architectures with layers of variable width. In this paper, we propose a continuous-time Autoencoder, which we call AutoencODE, based on a modification of the controlled field that drives the dynamics. This adaptation enables the extension of the mean-field control framework originally devised for conventional NeurODEs. In this setting, we tackle the case of low Tikhonov regularization, resulting in potentially non-convex cost landscapes. While the global results obtained for high Tikhonov regularization may not hold globally, we show that many of them can be recovered in regions where the loss function is locally convex. Inspired by our theoretical findings, we develop a training method tailored to this specific type of Autoencoders with residual connections, and we validate our approach through numerical experiments conducted on various examples.
Paper Structure (18 sections, 19 theorems, 143 equations, 11 figures, 1 table, 1 algorithm)

This paper contains 18 sections, 19 theorems, 143 equations, 11 figures, 1 table, 1 algorithm.

Key Result

Proposition 2.1

For every $t \in [0,T]$ and for every $\theta \in L^2([0,T], \mathbb{R}^m)$, let ${\mathcal{F}}$ satisfy Assumption ass:block1. Then, the flow $\Phi^\theta_{(0,t)}: \mathbb{R}^d \to \mathbb{R}^d$ is well-defined for any $x_0 \in \mathbb{R}^d$ and it satisfies the following properties.

Figures (11)

  • Figure 1: Left: network with an encoder structure. Right: Autoencoder.
  • Figure 2: Left: Embedding of an encoder into a dynamical system. Right: model for an Autoencoder.
  • Figure 3: Embedding of the U-net into a higher-dimensional dynamical system.
  • Figure 4: Left: Classification task performed when the turned off component is the natural one. Right: sketch of the AutoencODE architecture considered.
  • Figure 5: Left: Initial phase, i.e., separation of the data along the $y$-axis. Center: Encoding phase, i.e., only the second component is active. Right: Decoding phase and classification result after the "unnatural turn off". Notice that, for a nice clustering of the classified data, we have increased the number of layers from $20$ to $40$. However, we report that the network accomplishes the task even if we use the same structure as in Figure \ref{['fig:classification_natural']}.
  • ...and 6 more figures

Theorems & Definitions (48)

  • Definition 1.1
  • Proposition 2.1
  • proof
  • Remark 2.1
  • Remark 2.2
  • Definition 3.1
  • Proposition 3.2
  • proof
  • Proposition 3.3
  • proof
  • ...and 38 more