Table of Contents
Fetching ...

Differential Equations for Continuous-Time Deep Learning

Lars Ruthotto

TL;DR

This short, self-contained article seeks to introduce and survey continuous-time deep learning approaches that are based on neural ordinary differential equations (neural ODEs) and see how they can provide new insights into deep learning and a foundation for more efficient algorithms.

Abstract

This short, self-contained article seeks to introduce and survey continuous-time deep learning approaches that are based on neural ordinary differential equations (neural ODEs). It primarily targets readers familiar with ordinary and partial differential equations and their analysis who are curious to see their role in machine learning. Using three examples from machine learning and applied mathematics, we will see how neural ODEs can provide new insights into deep learning and a foundation for more efficient algorithms.

Differential Equations for Continuous-Time Deep Learning

TL;DR

This short, self-contained article seeks to introduce and survey continuous-time deep learning approaches that are based on neural ordinary differential equations (neural ODEs) and see how they can provide new insights into deep learning and a foundation for more efficient algorithms.

Abstract

This short, self-contained article seeks to introduce and survey continuous-time deep learning approaches that are based on neural ordinary differential equations (neural ODEs). It primarily targets readers familiar with ordinary and partial differential equations and their analysis who are curious to see their role in machine learning. Using three examples from machine learning and applied mathematics, we will see how neural ODEs can provide new insights into deep learning and a foundation for more efficient algorithms.
Paper Structure (7 sections, 25 equations, 3 figures, 1 table)

This paper contains 7 sections, 25 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Illustration of a continuous-time model for binary classification. Left column: input features and labels. Center column: propagated features given by the final state of ODE and hyperplane given by $W,b$. Right column: Labels predicted by the neural network. The rows show two instances of the problem using the original two-dimensional and augmented features (padded with one zero), respectively. While both models agree closely around the samples, we highlight some errors of the original model that arise from the restriction to invertible maps in $n=2$. This example demonstrates that augmenting overcomes the need for a non-invertible mapping. Note that the models may not be reliable in regions with no data points, for example, in the top right corner of the domain.
  • Figure 2: Illustrating the generative modeling problem. Given samples from the target distribution (represented by blue dots), we try to find an invertible transformation (represented by red lines) to a simple target distribution (a standard Gaussian).
  • Figure 3: Illustration of potential mean field game versions of relaxed dynamic optimal transport (left) and crowd motion problems (right). Both cases use the standard Gaussian reference $\pi_X$ (top) and the same Gaussian mixture as the target (bottom). As expected, in the optimal transport case, the trajectories are straight, whereas in the crowd motion case, the agents are curved to avoid an obstacle in the center of the domain. This example also shows that different dynamics can produce the same map $F_\theta$.