Table of Contents
Fetching ...

On the Generalization and Approximation Capacities of Neural Controlled Differential Equations

Linus Bleistein, Agathe Guilloux

TL;DR

This work provides the first theoretical analysis of Neural Controlled Differential Equations (NCDEs) for irregular time series. It establishes a sampling-dependent generalization bound by leveraging the Lipschitz continuity of NCDE flows and bounding the covering/Rademacher complexities of the predictor class. It then decomposes the total risk under a well-specified CDE model into discretization bias and approximation bias, deriving bounds via flow-continuity and linking neural approximation results to NCDEs. Numerical experiments corroborate that discretization gaps and input path variation measurably influence generalization, aligning with the theoretical predictions. Collectively, the results offer principled guidance on NCDE design, discretization choices, and how irregular sampling impacts learning performance in practice, with potential extensions to broader control-theoretic learning problems.

Abstract

Neural Controlled Differential Equations (NCDEs) are a state-of-the-art tool for supervised learning with irregularly sampled time series (Kidger, 2020). However, no theoretical analysis of their performance has been provided yet, and it remains unclear in particular how the irregularity of the time series affects their predictions. By merging the rich theory of controlled differential equations (CDE) and Lipschitz-based measures of the complexity of deep neural nets, we take a first step towards the theoretical understanding of NCDE. Our first result is a generalization bound for this class of predictors that depends on the regularity of the time series data. In a second time, we leverage the continuity of the flow of CDEs to provide a detailed analysis of both the sampling-induced bias and the approximation bias. Regarding this last result, we show how classical approximation results on neural nets may transfer to NCDEs. Our theoretical results are validated through a series of experiments.

On the Generalization and Approximation Capacities of Neural Controlled Differential Equations

TL;DR

This work provides the first theoretical analysis of Neural Controlled Differential Equations (NCDEs) for irregular time series. It establishes a sampling-dependent generalization bound by leveraging the Lipschitz continuity of NCDE flows and bounding the covering/Rademacher complexities of the predictor class. It then decomposes the total risk under a well-specified CDE model into discretization bias and approximation bias, deriving bounds via flow-continuity and linking neural approximation results to NCDEs. Numerical experiments corroborate that discretization gaps and input path variation measurably influence generalization, aligning with the theoretical predictions. Collectively, the results offer principled guidance on NCDE design, discretization choices, and how irregular sampling impacts learning performance in practice, with potential extensions to broader control-theoretic learning problems.

Abstract

Neural Controlled Differential Equations (NCDEs) are a state-of-the-art tool for supervised learning with irregularly sampled time series (Kidger, 2020). However, no theoretical analysis of their performance has been provided yet, and it remains unclear in particular how the irregularity of the time series affects their predictions. By merging the rich theory of controlled differential equations (CDE) and Lipschitz-based measures of the complexity of deep neural nets, we take a first step towards the theoretical understanding of NCDE. Our first result is a generalization bound for this class of predictors that depends on the regularity of the time series data. In a second time, we leverage the continuity of the flow of CDEs to provide a detailed analysis of both the sampling-induced bias and the approximation bias. Regarding this last result, we show how classical approximation results on neural nets may transfer to NCDEs. Our theoretical results are validated through a series of experiments.
Paper Structure (70 sections, 22 theorems, 195 equations, 5 figures, 1 table)

This paper contains 70 sections, 22 theorems, 195 equations, 5 figures, 1 table.

Key Result

Lemma 3.3

The value $f_\theta(x)$ is uniformly upper bounded by and $f_\theta(\mathbf{x}^D)$ is upper bounded by where $\kappa_\Theta(\mathbf{0}) = L_\sigma B_\mathbf{b} \frac{(L_\sigma B_\mathbf{A})^{q}-1}{L_\sigma B_\mathbf{A}-1}$ is an upper bound of $\left\lVert\mathbf{G}_\psi(\mathbf{0})\right\rVert_{\textnormal{op}} =: \max\limits_{\left\lVert u\right\rVert = 1} \left\lVert\mathbf{G}_\psi(0)u\right\

Figures (5)

  • Figure 1: 2D latent state of a CDE driven by the smooth path $x_t = (t,t)$, i.e. a regular ResNet, on the left and a CDE driven by a Brownian motion with drift $x_t = (t+W^{(1)}_t,t+W^{(2)}_t)$ on the right. The square markers indicate the initial values of the latent states. The color gradient indicates evolution through time.
  • Figure 2: Left: in bold, we plot the solutions of two shallow $(q=1)$ NCDEs who only differ in their vector field $\mathbf{A}^1$ and $\mathbf{A}^2$. We then plot the solutions of the NCDEs with interpolated vector field $\delta \mathbf{A}^1 + (1-\delta)\mathbf{A}^2$ for $\delta \in [0,1]$ in red-blue gradient. Center: we consider a given NCDE and interpolate linearly between two initial conditions $x^1_0$ and $x^2_0$. Right: we consider a given NCDE and drive it with the linear interpolation of two paths $(x^1_t)_t$ and $(x^2_t)_t$. In all three cases, the solutions evolve continuously as we interpolate between the models.
  • Figure 3: Left: Evolution of the prediction $(\Phi^\top z_t)_t$ of a NCDE with shallow neural vector field at different moments of the training process ; the true process $(\Phi_\star^\top z^\star_t)_t$ is shown in red. Center: Evolution of the parameter's norms, normalized by their value at initialization, during training. Right: Parameters' norm at the end of training, without normalization, over $25$ training instances of the same model. Training is performed with Adam kingma2014adam.
  • Figure 4: Left: latent state $(\Phi^\top z_t)_t$ at initialization (top) and after $10$ (middle) resp. $100$ (bottom) training steps as the NCDE learns to separate rough (in red) from smooth (in blue) paths on a fine grid. The figure shows examples from the test set. Center: Generalization error $\lvert n^{-1}\sum \ell(\mathbf{x}^{D,i},y^i) - \mathbb{E}_{\mathbf{x}^D,y}\, \ell(\mathbf{x}^D,y)\rvert$ vs. sampling gap. Right: Generalization error vs. average maximal path variation. We train $300$ NCDEs, with $\Phi$ and the initialization layer left untrained to isolate the effect of the discretization, with time series downsampled on $K=5$ points randomly chosen in $[0,1]$ for each run. Training is done with Adam with default parameters kingma2014adam.
  • Figure 5: A schematic illustration of the main bounded sets used in the proof. The initial values of the time series used for initializing the NCDE lie in the orange ball of diameter $B_x$. The value $z_0$ then lies in the blue ball $B_1$, whose diameter is upper bounded by a function of $B_\mathbf{U},B_v,B_x$ and $L_\sigma$. Finally, the trajectories evolves during time but stay within the ball $\Omega^D$.

Theorems & Definitions (31)

  • Definition 3.1
  • Definition 3.2
  • Lemma 3.3
  • Remark 3.4
  • Remark 3.5
  • Theorem 4.1
  • Proposition 4.2
  • Remark 5.1
  • Lemma 5.2
  • Lemma 5.3
  • ...and 21 more