Table of Contents
Fetching ...

Continuum Dropout for Neural Differential Equations

Jonghun Lee, YongKyung Oh, Sungil Kim, Dong-Young Lim

TL;DR

Continuum Dropout redefines dropout for Neural Differential Equations as a continuous-time on-off gate governed by an alternating renewal process. It yields a universal regularization framework for NDEs and enables principled uncertainty quantification through test-time Monte Carlo sampling. By introducing dropout rate $p$ and renewal count $m$, it defines rates $(\lambda_1,\lambda_2)$ that control active/inactive durations and provides closed-form approximations for large horizons. Empirically, Continuum Dropout improves generalization and calibration across time-series and image benchmarks and demonstrates broad applicability across Neural ODEs, CDEs, and SDEs, with manageable computational overhead during inference.

Abstract

Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their advantages, NDEs face a fundamental challenge in adopting dropout, a cornerstone of deep learning regularization, making them susceptible to overfitting. To address this research gap, we introduce Continuum Dropout, a universally applicable regularization technique for NDEs built upon the theory of alternating renewal processes. Continuum Dropout formulates the on-off mechanism of dropout as a stochastic process that alternates between active (evolution) and inactive (paused) states in continuous time. This provides a principled approach to prevent overfitting and enhance the generalization capabilities of NDEs. Moreover, Continuum Dropout offers a structured framework to quantify predictive uncertainty via Monte Carlo sampling at test time. Through extensive experiments, we demonstrate that Continuum Dropout outperforms existing regularization methods for NDEs, achieving superior performance on various time series and image classification tasks. It also yields better-calibrated and more trustworthy probability estimates, highlighting its effectiveness for uncertainty-aware modeling.

Continuum Dropout for Neural Differential Equations

TL;DR

Continuum Dropout redefines dropout for Neural Differential Equations as a continuous-time on-off gate governed by an alternating renewal process. It yields a universal regularization framework for NDEs and enables principled uncertainty quantification through test-time Monte Carlo sampling. By introducing dropout rate and renewal count , it defines rates that control active/inactive durations and provides closed-form approximations for large horizons. Empirically, Continuum Dropout improves generalization and calibration across time-series and image benchmarks and demonstrates broad applicability across Neural ODEs, CDEs, and SDEs, with manageable computational overhead during inference.

Abstract

Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their advantages, NDEs face a fundamental challenge in adopting dropout, a cornerstone of deep learning regularization, making them susceptible to overfitting. To address this research gap, we introduce Continuum Dropout, a universally applicable regularization technique for NDEs built upon the theory of alternating renewal processes. Continuum Dropout formulates the on-off mechanism of dropout as a stochastic process that alternates between active (evolution) and inactive (paused) states in continuous time. This provides a principled approach to prevent overfitting and enhance the generalization capabilities of NDEs. Moreover, Continuum Dropout offers a structured framework to quantify predictive uncertainty via Monte Carlo sampling at test time. Through extensive experiments, we demonstrate that Continuum Dropout outperforms existing regularization methods for NDEs, achieving superior performance on various time series and image classification tasks. It also yields better-calibrated and more trustworthy probability estimates, highlighting its effectiveness for uncertainty-aware modeling.

Paper Structure

This paper contains 51 sections, 7 theorems, 57 equations, 10 figures, 22 tables, 1 algorithm.

Key Result

Theorem 3.1

For fixed $T>0$, let $\{{\mathbf{z}}(t)\}_{0\leq t \leq T}$ be the latent process with Continuum Dropout where the active and inactive period lengths, $\{X_n^{(i)}\}_{n\geq 1}$ and $\{Y_n^{(i)}\}_{n\geq 1}$, are i.i.d. exponential random variables with rates $\lambda_1$ and $\lambda_2$, respectively

Figures (10)

  • Figure 1: Illustration of dropout in discrete neural networks and continuous-time latent processes: (a) discrete neural network, (b) neural network with dropout, (c) NDE, (d) NDE with Continuum Dropout.
  • Figure 2: Illustration of $i$-th component of the latent process ${\mathbf{z}}(t)$ with Continuum Dropout
  • Figure 3: Reliability diagrams illustrating calibration performance on CIFAR-100 and Speech Commands datasets.
  • Figure 4: Performance with Different Numbers of MC Simulation Samples on Speech Commands
  • Figure 5: Performance comparison with different hyperparameters on extended datasets
  • ...and 5 more figures

Theorems & Definitions (10)

  • Theorem 3.1
  • Corollary 3.2
  • Definition A.1
  • Theorem A.2
  • Theorem A.3
  • Theorem A.4
  • Theorem A.5
  • Theorem A.6
  • proof : Proof of Theorem \ref{['thm:drop_rate']}
  • proof : Proof of Corollary \ref{['cor:drop_rate']}