Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions

Jianxin Zhang; Josh Viktorov; Doosan Jung; Emily Pitler

Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions

Jianxin Zhang, Josh Viktorov, Doosan Jung, Emily Pitler

TL;DR

This work addresses the inefficiencies of training Neural SDEs with signature-kernel or adversarial objectives by introducing Finite Dimensional Matching (FDM), a strictly proper scoring-rule framework for continuous Markov processes. By converting a strictly proper scoring rule on $bR^{2d}$ into a process-level rule via averaging over two-time marginals, FDM enables an objective that scales as $O(D)$ per epoch, avoiding the PDEs and double integrals that burden prior methods. The authors prove the core scoring rule extension is strictly proper and provide rigorous sample-complexity and sensitivity analyses, complemented by empirical results across diverse financial and synthetic datasets where FDM consistently yields superior generative quality and faster training. The approach offers a principled, scalable alternative to GANs and signature-based methods, with practical impact for mesh-free time-series modeling in finance, physics, and biology. The work also envisions extensions to non-continuous or non-Markov settings, including càdlàg processes and hidden-Markov structures.

Abstract

Neural Stochastic Differential Equations (Neural SDEs) have emerged as powerful mesh-free generative models for continuous stochastic processes, with critical applications in fields such as finance, physics, and biology. Previous state-of-the-art methods have relied on adversarial training, such as GANs, or on minimizing distance measures between processes using signature kernels. However, GANs suffer from issues like instability, mode collapse, and the need for specialized training techniques, while signature kernel-based methods require solving linear PDEs and backpropagating gradients through the solver, whose computational complexity scales quadratically with the discretization steps. In this paper, we identify a novel class of strictly proper scoring rules for comparing continuous Markov processes. This theoretical finding naturally leads to a novel approach called Finite Dimensional Matching (FDM) for training Neural SDEs. Our method leverages the Markov property of SDEs to provide a computationally efficient training objective. This scoring rule allows us to bypass the computational overhead associated with signature kernels and reduces the training complexity from $O(D^2)$ to $O(D)$ per epoch, where $D$ represents the number of discretization steps of the process. We demonstrate that FDM achieves superior performance, consistently outperforming existing methods in terms of both computational efficiency and generative quality.

Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions

TL;DR

into a process-level rule via averaging over two-time marginals, FDM enables an objective that scales as

per epoch, avoiding the PDEs and double integrals that burden prior methods. The authors prove the core scoring rule extension is strictly proper and provide rigorous sample-complexity and sensitivity analyses, complemented by empirical results across diverse financial and synthetic datasets where FDM consistently yields superior generative quality and faster training. The approach offers a principled, scalable alternative to GANs and signature-based methods, with practical impact for mesh-free time-series modeling in finance, physics, and biology. The work also envisions extensions to non-continuous or non-Markov settings, including càdlàg processes and hidden-Markov structures.

Abstract

per epoch, where

represents the number of discretization steps of the process. We demonstrate that FDM achieves superior performance, consistently outperforming existing methods in terms of both computational efficiency and generative quality.

Paper Structure (20 sections, 10 theorems, 41 equations, 32 figures, 17 tables, 1 algorithm)

This paper contains 20 sections, 10 theorems, 41 equations, 32 figures, 17 tables, 1 algorithm.

Introduction
Related Work
Scoring Rules
Neural SDEs
Preliminaries
Finite Dimensional Matching
Scoring Rule for Markov Process
FDM Algorithm
Theoretical Properties
Sample Complexity
Sensitivity
Experiments
Conclusion, Limitations, and Future Work
Proof of Theorem \ref{['thm:scoring4continuous']}
Proof of Sample Complexity
...and 5 more sections

Key Result

Theorem 2

If $s$ is a strictly proper scoring rule for distributions on $\mathcal{E} \times \mathcal{E}$, $\Bar{s}$ is a strictly proper scoring rule for $\mathcal{E}$-valued continuous Markov processes on $[0, T]$ where $T \in \mathbb{R}_{>0}$. That is, for any $\mathcal{E}$-valued continuous Markov processe

Figures (32)

Figure 1: Blue points are real samples and orange points are generated by Neural SDEs. The dynamics of the joint distribution of gold and silver prices in the metal price data. Each row of plots corresponds to a method and each row corresponds to a timestamp. For each plot, the horizontal axis is the silver price and the vertical axis is the gold price.
Figure 2: Sample paths for silver (top) and gold (bottom) prices from the metal dataset. Blue lines represent real samples, while red lines represent those generated by Neural SDEs. From left to right, the plots correspond to signature kernels, truncated signature, SDE-GAN, and FDM, respectively. The horizontal axis represents time, and the vertical axis represents metal prices.
Figure 3: Blue points are real samples and orange points are generated by Neural SDEs. The dynamics of the joint distribution of gold and silver prices in the metal price data. Each row of plots corresponds to a method and each row corresponds to a timestamp. For each plot, the horizontal axis is the silver price and the vertical axis is the gold price.
Figure 4: Blue points are real samples and orange points are generated by Neural SDEs. The dynamics of the joint distribution of Dollar and USA30 in the U.S. stock indices data. Each row of plots corresponds to a method and each row corresponds to a timestamp. For each plot, the horizontal axis is Dollar (US Dollar Index) and the vertical axis is USA30 (USA 30 Index).
Figure 5: Blue points are real samples and orange points are generated by Neural SDEs. The dynamics of the joint distribution of Dollar and USA500 in the U.S. stock indices data. Each row of plots corresponds to a method and each row corresponds to a timestamp. For each plot, the horizontal axis is Dollar (US Dollar Index) and the vertical axis is USA500 (USA 500 Index).
...and 27 more figures

Theorems & Definitions (20)

Definition 1
Theorem 2
Theorem 3
Theorem 4
Lemma 5
proof
Definition 6
Theorem 7
proof : Proof for Theorem \ref{['thm:scoring4continuous_general']}
proof : Proof for Theorem \ref{['thm:scoring4continuous']}
...and 10 more

Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions

TL;DR

Abstract

Efficient Training of Neural Stochastic Differential Equations by Matching Finite Dimensional Distributions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (32)

Theorems & Definitions (20)