Continuum Dropout for Neural Differential Equations
Jonghun Lee, YongKyung Oh, Sungil Kim, Dong-Young Lim
TL;DR
Continuum Dropout redefines dropout for Neural Differential Equations as a continuous-time on-off gate governed by an alternating renewal process. It yields a universal regularization framework for NDEs and enables principled uncertainty quantification through test-time Monte Carlo sampling. By introducing dropout rate $p$ and renewal count $m$, it defines rates $(\lambda_1,\lambda_2)$ that control active/inactive durations and provides closed-form approximations for large horizons. Empirically, Continuum Dropout improves generalization and calibration across time-series and image benchmarks and demonstrates broad applicability across Neural ODEs, CDEs, and SDEs, with manageable computational overhead during inference.
Abstract
Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their advantages, NDEs face a fundamental challenge in adopting dropout, a cornerstone of deep learning regularization, making them susceptible to overfitting. To address this research gap, we introduce Continuum Dropout, a universally applicable regularization technique for NDEs built upon the theory of alternating renewal processes. Continuum Dropout formulates the on-off mechanism of dropout as a stochastic process that alternates between active (evolution) and inactive (paused) states in continuous time. This provides a principled approach to prevent overfitting and enhance the generalization capabilities of NDEs. Moreover, Continuum Dropout offers a structured framework to quantify predictive uncertainty via Monte Carlo sampling at test time. Through extensive experiments, we demonstrate that Continuum Dropout outperforms existing regularization methods for NDEs, achieving superior performance on various time series and image classification tasks. It also yields better-calibrated and more trustworthy probability estimates, highlighting its effectiveness for uncertainty-aware modeling.
