Table of Contents
Fetching ...

Deep learning and the rate of approximation by flows

Jingpu Cheng, Qianxiao Li, Ting Lin, Zuowei Shen

Abstract

We investigate the dependence of the approximation capacity of deep residual networks on its depth in a continuous dynamical systems setting. This can be formulated as the general problem of quantifying the minimal time-horizon required to approximate a diffeomorphism by flows driven by a given family $\mathcal F$ of vector fields. We show that this minimal time can be identified as a geodesic distance on a sub-Finsler manifold of diffeomorphisms, where the local geometry is characterised by a variational principle involving $\mathcal F$. This connects the learning efficiency of target relationships to their compatibility with the learning architectural choice. Further, the results suggest that the key approximation mechanism in deep learning, namely the approximation of functions by composition or dynamics, differs in a fundamental way from linear approximation theory, where linear spaces and norm-based rate estimates are replaced by manifolds and geodesic distances.

Deep learning and the rate of approximation by flows

Abstract

We investigate the dependence of the approximation capacity of deep residual networks on its depth in a continuous dynamical systems setting. This can be formulated as the general problem of quantifying the minimal time-horizon required to approximate a diffeomorphism by flows driven by a given family of vector fields. We show that this minimal time can be identified as a geodesic distance on a sub-Finsler manifold of diffeomorphisms, where the local geometry is characterised by a variational principle involving . This connects the learning efficiency of target relationships to their compatibility with the learning architectural choice. Further, the results suggest that the key approximation mechanism in deep learning, namely the approximation of functions by composition or dynamics, differs in a fundamental way from linear approximation theory, where linear spaces and norm-based rate estimates are replaced by manifolds and geodesic distances.
Paper Structure (23 sections, 26 theorems, 302 equations, 2 figures)

This paper contains 23 sections, 26 theorems, 302 equations, 2 figures.

Key Result

Theorem 2.1

Suppose $(\mathcal{M}, \mathcal{F})$ is a compatible pair, as defined in Definition def:compatible-pair. Then, the flow-complexity metric $d_{\mathcal{F}}$ coincides with the geodesic distance induced by the local norm eq:local_norm_variational. In particular, for all $\psi_1,\psi_2\in\mathcal{M}$, The proof is given in sec:main_results_general.

Figures (2)

  • Figure 1: Diagram illustrating the connection built in the main result between the approximation complexity by flows and sub-Finsler geometry
  • Figure 2: Comparison of distance contours between $d_{\mathcal{F}}$ and $L^2$ on the convex hull of three functions in $\mathcal{M}$

Theorems & Definitions (59)

  • Theorem 2.1
  • Remark 2.1
  • Proposition 3.1: Proposition 4.8 in li2022deep
  • Proposition 3.2
  • Theorem 3.1
  • Lemma 3.1
  • proof
  • Proposition 3.3
  • proof
  • Lemma 3.2
  • ...and 49 more