A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration

Bunlong Lay; Timo Gerkmann

A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration

Bunlong Lay, Timo Gerkmann

TL;DR

This work develops a formalism of interpolating Stochastic Differential Equations (iSDEs) that includes SGMSE+, and proposes a solver for iSDEs that enables fast sampling with as few as 10 Neural Network evaluations across multiple speech restoration tasks.

Abstract

Diffusion Probabilistic Models (DPMs) are a well-established class of diffusion models for unconditional image generation, while SGMSE+ is a well-established conditional diffusion model for speech enhancement. One of the downsides of diffusion models is that solving the reverse process requires many evaluations of a large Neural Network. Although advanced fast sampling solvers have been developed for DPMs, they are not directly applicable to models such as SGMSE+ due to differences in their diffusion processes. Specifically, DPMs transform between the data distribution and a standard Gaussian distribution, whereas SGMSE+ interpolates between the target distribution and a noisy observation. This work first develops a formalism of interpolating Stochastic Differential Equations (iSDEs) that includes SGMSE+, and second proposes a solver for iSDEs. The proposed solver enables fast sampling with as few as 10 Neural Network evaluations across multiple speech restoration tasks.

A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration

TL;DR

Abstract

Paper Structure (29 sections, 39 equations, 1 figure, 2 tables, 1 algorithm)

This paper contains 29 sections, 39 equations, 1 figure, 2 tables, 1 algorithm.

Introduction
Diffusion Models
Stochastic Differential Equations
Interpolating SDEs: Unifying Conditional Diffusion
Reverse Process
Runge-Kutta ODE Solvers
Existing Fast DPM-Solver
Proposed Fast ISDE-Solver
Experimental Setup
Data representation
Data set and audio tasks
Noise reduction
Bandwidth Extension
Dereverberation
MP3 Decoding
...and 14 more sections

Figures (1)

Figure 1: Results on the different tasks with different samplers. For all tasks, adaptive RK45 uses more than 40 nfe. Audio examples can be found in the supplementary materials.

A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration

TL;DR

Abstract

A Fast Solver for Interpolating Stochastic Differential Equation Diffusion Models for Speech Restoration

Authors

TL;DR

Abstract

Table of Contents

Figures (1)