Malliavin Calculus with Weak Derivatives for Counterfactual Stochastic Optimization
Vikram Krishnamurthy, Luke Snow
TL;DR
This paper tackles counterfactual stochastic optimization for conditional losses under rare conditioning events in diffusion models, where direct sampling is infeasible. It develops a kernel-free two-stage framework: (i) a Malliavin-calculus-based Skorohod representation expresses $\mathbb{E}[\ell(X^{\theta}) \mid g(X^{\theta})=0]$ as a ratio of unconditional expectations, yielding standard $O(1/N)$ MC variance even for measure-zero events, and (ii) a weak-derivative gradient estimator based on the Hahn–Jordan decomposition provides $O(1)$ variance in the time horizon $T$, avoiding the $O(T)$ variance growth of score-function methods. The combination supports an efficient counterfactual stochastic gradient algorithm for approximating local minima of $L(\theta)$, with a concrete implementation for an Ornstein–Uhlenbeck process. The framework connects Malliavin calculus, generator-based sensitivities, and discrete weak-derivative methods to deliver kernel-free, scalable optimization in rare-event regimes, with implications for passive learning and safety-constrained diffusion models.
Abstract
We study counterfactual stochastic optimization of conditional loss functionals under misspecified and noisy gradient information. The difficulty is that when the conditioning event has vanishing or zero probability, naive Monte Carlo estimators are prohibitively inefficient; kernel smoothing, though common, suffers from slow convergence. We propose a two-stage kernel-free methodology. First, we show using Malliavin calculus that the conditional loss functional of a diffusion process admits an exact representation as a Skorohod integral, yielding variance comparable to classical Monte-Carlo variance. Second, we establish that a weak derivative estimate of the conditional loss functional with respect to model parameters can be evaluated with constant variance, in contrast to the widely used score function method whose variance grows linearly in the sample path length. Together, these results yield an efficient framework for counterfactual conditional stochastic gradient algorithms in rare-event regimes.
