Table of Contents
Fetching ...

Efficient Training of Neural SDEs Using Stochastic Optimal Control

Rembert Daems, Manfred Opper, Guillaume Crevecoeur, Tolga Birdal

TL;DR

The paper addresses the computational bottleneck of variational inference for neural SDEs by introducing a stochastic optimal control framework that decomposes the variational posterior control into a linear, closed-form term and a learnable nonlinear residual. For linear, Gaussian priors, the optimal control is derived in closed form as $u(x,t)=\sigma(t)^T \nabla_x \log \mathcal{N}(\mathbf O; \mathbf m_x, \mathbf C+\Sigma_0)$, which reduces further to a simple expression when $p(\mathbf X(T)|x)$ is Gaussian. The nonlinear residual is captured by neural networks, enabling expressive modeling without sacrificing initialization and convergence speed. Empirical results on BM and MA-fBM data show faster convergence and lower loss with the hybrid approach, highlighting a practical path to efficient uncertainty-aware time-series modeling with neural SDEs.

Abstract

We present a hierarchical, control theory inspired method for variational inference (VI) for neural stochastic differential equations (SDEs). While VI for neural SDEs is a promising avenue for uncertainty-aware reasoning in time-series, it is computationally challenging due to the iterative nature of maximizing the ELBO. In this work, we propose to decompose the control term into linear and residual non-linear components and derive an optimal control term for linear SDEs, using stochastic optimal control. Modeling the non-linear component by a neural network, we show how to efficiently train neural SDEs without sacrificing their expressive power. Since the linear part of the control term is optimal and does not need to be learned, the training is initialized at a lower cost and we observe faster convergence.

Efficient Training of Neural SDEs Using Stochastic Optimal Control

TL;DR

The paper addresses the computational bottleneck of variational inference for neural SDEs by introducing a stochastic optimal control framework that decomposes the variational posterior control into a linear, closed-form term and a learnable nonlinear residual. For linear, Gaussian priors, the optimal control is derived in closed form as , which reduces further to a simple expression when is Gaussian. The nonlinear residual is captured by neural networks, enabling expressive modeling without sacrificing initialization and convergence speed. Empirical results on BM and MA-fBM data show faster convergence and lower loss with the hybrid approach, highlighting a practical path to efficient uncertainty-aware time-series modeling with neural SDEs.

Abstract

We present a hierarchical, control theory inspired method for variational inference (VI) for neural stochastic differential equations (SDEs). While VI for neural SDEs is a promising avenue for uncertainty-aware reasoning in time-series, it is computationally challenging due to the iterative nature of maximizing the ELBO. In this work, we propose to decompose the control term into linear and residual non-linear components and derive an optimal control term for linear SDEs, using stochastic optimal control. Modeling the non-linear component by a neural network, we show how to efficiently train neural SDEs without sacrificing their expressive power. Since the linear part of the control term is optimal and does not need to be learned, the training is initialized at a lower cost and we observe faster convergence.

Paper Structure

This paper contains 9 sections, 2 theorems, 11 equations, 1 figure.

Key Result

Proposition 1

The variational parameters$\phi$ are optimised by minimising the KL--divergence between the posterior and the prior, where the corresponding evidence lower bound (ELBO) is maximized to find the most likely parameters $\theta$: where the observations $\{O_i\}$ are included by likelihoods $p_\theta\left(O_{i} \mid \tilde{X}(t_i)\right)$ and the expectation is taken over random paths of the approxim

Figures (1)

  • Figure 1: We show the loss (negative ELBO) curves of the models driven by BM (left) and MA-fBM (right). For both experiments, our proposed hybrid model (green) starts training with a loss that is multiple orders of magnitude smaller and converges much faster than a standard non-linear neural network model (blue). Our hybrid model (green) also performs better than the strictly linear model (orange), especially for the MA-fBM experiment.

Theorems & Definitions (5)

  • Definition 1: SDEs driven by BM (BMSDE)
  • Definition 2: Posterior SDE
  • Proposition 1: Variational Inference for BMSDE opper2019variationalli2020scalable
  • Proposition 2
  • proof : Sketch of the proof