WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion

Teysir Baoueb; Xiaoyu Bie; Hicham Janati; Gael Richard

WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion

Teysir Baoueb, Xiaoyu Bie, Hicham Janati, Gael Richard

TL;DR

WaveTransfer is introduced, an end-to-end diffusion model designed for timbre transfer that accommodates multiple types of timbre transfer between unique instrument pairs in a single model, eliminating the need for separate model training for each pairing.

Abstract

As diffusion-based deep generative models gain prevalence, researchers are actively investigating their potential applications across various domains, including music synthesis and style alteration. Within this work, we are interested in timbre transfer, a process that involves seamlessly altering the instrumental characteristics of musical pieces while preserving essential musical elements. This paper introduces WaveTransfer, an end-to-end diffusion model designed for timbre transfer. We specifically employ the bilateral denoising diffusion model (BDDM) for noise scheduling search. Our model is capable of conducting timbre transfer between audio mixtures as well as individual instruments. Notably, it exhibits versatility in that it accommodates multiple types of timbre transfer between unique instrument pairs in a single model, eliminating the need for separate model training for each pairing. Furthermore, unlike recent works limited to 16 kHz, WaveTransfer can be trained at various sampling rates, including the industry-standard 44.1 kHz, a feature of particular interest to the music community.

WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion

TL;DR

Abstract

Paper Structure (18 sections, 7 equations, 3 figures, 4 tables)

This paper contains 18 sections, 7 equations, 3 figures, 4 tables.

Introduction
Background
Denoising diffusion probabilistic models (DDPM)
Bilateral denoising diffusion models (BDDM)
Proposed Method
Training procedure
Model architecture
Inference procedure
Experiments
Dataset and preprocessing
Training setup
Metrics for evaluation
Inference noise schedules
Results
Inference conducted with global models
...and 3 more sections

Figures (3)

Figure 1: Timbre transfer using diffusion models. The objective is to generate a target audio ${\mathbf{x}}_0^A$ from a random noise ${\mathbf{x}}_T^A$ and a conditioning audio ${\mathbf{x}}_0^B$, where ${\mathbf{x}}_0^A$ has the same content as ${\mathbf{x}}_0^B$ but is played with a different instrument.
Figure 2: Training process of WaveTransfer
Figure 3: Inference process of WaveTransfer

WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion

TL;DR

Abstract

WaveTransfer: A Flexible End-to-end Multi-instrument Timbre Transfer with Diffusion

Authors

TL;DR

Abstract

Table of Contents

Figures (3)