Table of Contents
Fetching ...

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh

TL;DR

This work addresses the lack of regularization in Neural ODEs by introducing Neural SDEs that inject stochastic noise into the continuous dynamics. It provides a framework to model dropout, Gaussian, and other regularization schemes as diffusion terms, along with a memory-efficient pathwise gradient-based training method and a stability analysis showing robustness gains. Empirically, Neural SDE improves generalization on CIFAR-10 and enhances resistance to both adversarial and non-adversarial perturbations, with additional gains when using testing-time noise ensembles. Overall, the approach offers a principled, drop-in mechanism to stabilize and regularize continuous-depth networks with tangible performance and robustness benefits.

Abstract

Neural Ordinary Differential Equation (Neural ODE) has been proposed as a continuous approximation to the ResNet architecture. Some commonly used regularization mechanisms in discrete neural networks (e.g. dropout, Gaussian noise) are missing in current Neural ODE networks. In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE) network, which naturally incorporates various commonly used regularization mechanisms based on random noise injection. Our framework can model various types of noise injection frequently used in discrete networks for regularization purpose, such as dropout and additive/multiplicative noise in each block. We provide theoretical analysis explaining the improved robustness of Neural SDE models against input perturbations/adversarial attacks. Furthermore, we demonstrate that the Neural SDE network can achieve better generalization than the Neural ODE and is more resistant to adversarial and non-adversarial input perturbations.

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise

TL;DR

This work addresses the lack of regularization in Neural ODEs by introducing Neural SDEs that inject stochastic noise into the continuous dynamics. It provides a framework to model dropout, Gaussian, and other regularization schemes as diffusion terms, along with a memory-efficient pathwise gradient-based training method and a stability analysis showing robustness gains. Empirically, Neural SDE improves generalization on CIFAR-10 and enhances resistance to both adversarial and non-adversarial perturbations, with additional gains when using testing-time noise ensembles. Overall, the approach offers a principled, drop-in mechanism to stabilize and regularize continuous-depth networks with tangible performance and robustness benefits.

Abstract

Neural Ordinary Differential Equation (Neural ODE) has been proposed as a continuous approximation to the ResNet architecture. Some commonly used regularization mechanisms in discrete neural networks (e.g. dropout, Gaussian noise) are missing in current Neural ODE networks. In this paper, we propose a new continuous neural network framework called Neural Stochastic Differential Equation (Neural SDE) network, which naturally incorporates various commonly used regularization mechanisms based on random noise injection. Our framework can model various types of noise injection frequently used in discrete networks for regularization purpose, such as dropout and additive/multiplicative noise in each block. We provide theoretical analysis explaining the improved robustness of Neural SDE models against input perturbations/adversarial attacks. Furthermore, we demonstrate that the Neural SDE network can achieve better generalization than the Neural ODE and is more resistant to adversarial and non-adversarial input perturbations.

Paper Structure

This paper contains 18 sections, 5 theorems, 30 equations, 4 figures, 2 tables.

Key Result

Theorem 3.1

For continuously differentiable loss $\ell({\bm{h}}_{t_1})$, we can obtain an unbiased gradient estimator as Moreover, if we define ${\bm{\beta}}_t=\frac{\partial{\bm{h}}_{t}}{\partial{\bm{w}}}$, then ${\bm{\beta}}_t$ follows another SDE It is easy to check that if ${\bm{G}}\equiv{\bm{0}}$, then our Monte-Carlo gradient estimator eq:mc-gradient falls back to the exact gradient by back-propagatio

Figures (4)

  • Figure 1: Toy example. By comparing the simulations under $\sigma=0$ and $\sigma=2.8$, we see adding noise to the system can be an effective way to control $x_t$. Average over multiple runs is used to cancel out the volatility during the early stage.
  • Figure 2: Our model architecture. The initial value of SDE is the output of a convolutional layer, and the value at time $T$ is passed to a linear classifier after average pooling.
  • Figure 3: Comparing the robustness against $\ell_2$-norm constrained adversarial perturbations, on CIFAR-10 (left), STL-10 (middle) and Tiny-ImageNet (right) data. We evaluate testing accuracy with three models, namely Neural ODE, Neural SDE with multiplicative noise and dropout noise.
  • Figure 4: Comparing the perturbations of hidden states, ${\bm{\varepsilon}}_t$, on both ODE and SDE (we choose dropout-style noise).

Theorems & Definitions (8)

  • Theorem 3.1
  • Definition 3.1: Lyapunov stability of SDE
  • Theorem 3.2
  • Corollary 3.2.1
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof