Table of Contents
Fetching ...

Controllability of continuous networks and a kernel-based learning approximation

Michael Herty, Chiara Segala, Giuseppe Visconti

TL;DR

This work shows theoretical results on the controllability of the linear microscopic and mean-field dynamics through the Hilbert Uniqueness Method and proposes a computational approach based on kernel learning methods to solve numerically, and efficiently, the training problem.

Abstract

Residual deep neural networks are formulated as interacting particle systems leading to a description through neural differential equations, and, in the case of large input data, through mean-field neural networks. The mean-field description allows also the recast of the training processes as a controllability problem for the solution to the mean-field dynamics. We show theoretical results on the controllability of the linear microscopic and mean-field dynamics through the Hilbert Uniqueness Method and propose a computational approach based on kernel learning methods to solve numerically, and efficiently, the training problem. Further aspects of the structural properties of the mean-field equation will be reviewed.

Controllability of continuous networks and a kernel-based learning approximation

TL;DR

This work shows theoretical results on the controllability of the linear microscopic and mean-field dynamics through the Hilbert Uniqueness Method and proposes a computational approach based on kernel learning methods to solve numerically, and efficiently, the training problem.

Abstract

Residual deep neural networks are formulated as interacting particle systems leading to a description through neural differential equations, and, in the case of large input data, through mean-field neural networks. The mean-field description allows also the recast of the training processes as a controllability problem for the solution to the mean-field dynamics. We show theoretical results on the controllability of the linear microscopic and mean-field dynamics through the Hilbert Uniqueness Method and propose a computational approach based on kernel learning methods to solve numerically, and efficiently, the training problem. Further aspects of the structural properties of the mean-field equation will be reviewed.
Paper Structure (6 sections, 7 theorems, 42 equations, 5 figures)

This paper contains 6 sections, 7 theorems, 42 equations, 5 figures.

Key Result

proposition thmcounterproposition

Let $u \in L^2 ((t_0,T); \mathbb{R}^d)$. Let $x: [t_0,T]\rightarrow\mathbb{R}^{Md}$ be the solution of the Cauchy problem defined by eq:system_Coron with initial condition $x(t_0) = 0$. Let $\lambda^T\in \mathbb{R}^{Md}$ and let $\lambda: [t_0,T]\rightarrow\mathbb{R}^{Md}$ be the solution of the Cau

Figures (5)

  • Figure 1.1: Time evolution of $\Phi(t)$ towards the target value $y = 0$ for different weights configurations. The blue solid line (a) and the red dashed line (b) represent Case (1) with $\alpha = 0$ and $\alpha = 4$ respectively. The green circle line (c) represent Case (2) with $\alpha = 4$.
  • Figure 1.2: Microscopic case. Leftmost panel: exact loss functional computed on the parameter grid. Center-left panel: approximated loss functional via kernel method with $N=20$ points. Center-right panel: comparison between $\mathcal{L}$ and $\widehat{\mathcal{L}}$. Rightmost panel: relative error between $\mathcal{L}$ and $\widehat{\mathcal{L}}$.
  • Figure 1.3: Microscopic case. Left: $w$ vs. the cost during the iteration of the gradient method. Center: $b$ vs. the cost during the iteration of the gradient method. Right: evolution of the mean of the data with different parameters.
  • Figure 1.4: Mean-field case. Leftmost panel: exact loss functional computed on the parameter grid. Center-left panel: approximated loss functional via kernel method with $N=20$ points. Center-right panel: comparison between $\mathcal{L}$ and $\widehat{\mathcal{L}}$. Rightmost panel: relative error between $\mathcal{L}$ and $\widehat{\mathcal{L}}$..
  • Figure 1.5: Mean-field case. Left: $w$ vs. the cost during the iteration of the gradient method. Center: $b$ vs. the cost during the iteration of the gradient method. Right: evolution of the distributions with different parameters.

Theorems & Definitions (16)

  • definition thmcounterdefinition: Microscopic controllability
  • proposition thmcounterproposition
  • theorem 2
  • corollary thmcountercorollary
  • Proof 1
  • definition thmcounterdefinition: Mean-field controllability
  • proposition thmcounterproposition
  • Proof 2
  • theorem 3
  • proposition thmcounterproposition
  • ...and 6 more