Table of Contents
Fetching ...

A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration

Bunlong Lay, Timo Gerkmann

TL;DR

This work develops a formalism of interpolating Stochastic Differential Equations (iSDEs) that includes SGMSE+, and proposes a solver for iSDEs that enables fast sampling with as few as 10 Neural Network evaluations across multiple speech restoration tasks.

Abstract

Diffusion Probabilistic Models (DPMs) are a well-established class of diffusion models for unconditional image generation, while SGMSE+ is a well-established conditional diffusion model for speech enhancement. One of the downsides of diffusion models is that solving the reverse process requires many evaluations of a large Neural Network. Although advanced fast sampling solvers have been developed for DPMs, they are not directly applicable to models such as SGMSE+ due to differences in their diffusion processes. Specifically, DPMs transform between the data distribution and a standard Gaussian distribution, whereas SGMSE+ interpolates between the target distribution and a noisy observation. This work first develops a formalism of interpolating Stochastic Differential Equations (iSDEs) that includes SGMSE+, and second proposes a solver for iSDEs. The proposed solver enables fast sampling with as few as 10 Neural Network evaluations across multiple speech restoration tasks.

A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration

TL;DR

This work develops a formalism of interpolating Stochastic Differential Equations (iSDEs) that includes SGMSE+, and proposes a solver for iSDEs that enables fast sampling with as few as 10 Neural Network evaluations across multiple speech restoration tasks.

Abstract

Diffusion Probabilistic Models (DPMs) are a well-established class of diffusion models for unconditional image generation, while SGMSE+ is a well-established conditional diffusion model for speech enhancement. One of the downsides of diffusion models is that solving the reverse process requires many evaluations of a large Neural Network. Although advanced fast sampling solvers have been developed for DPMs, they are not directly applicable to models such as SGMSE+ due to differences in their diffusion processes. Specifically, DPMs transform between the data distribution and a standard Gaussian distribution, whereas SGMSE+ interpolates between the target distribution and a noisy observation. This work first develops a formalism of interpolating Stochastic Differential Equations (iSDEs) that includes SGMSE+, and second proposes a solver for iSDEs. The proposed solver enables fast sampling with as few as 10 Neural Network evaluations across multiple speech restoration tasks.
Paper Structure (29 sections, 39 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 29 sections, 39 equations, 1 figure, 2 tables, 1 algorithm.

Figures (1)

  • Figure 1: Results on the different tasks with different samplers. For all tasks, adaptive RK45 uses more than 40 nfe. Audio examples can be found in the supplementary materials.