Table of Contents
Fetching ...

Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data

Yu-Hua Chen, Woosung Choi, Wei-Hsiang Liao, Marco Martínez-Ramírez, Kin Wai Cheuk, Yuki Mitsufuji, Jyh-Shing Roger Jang, Yi-Hsuan Yang

TL;DR

Drawing inspiration from recent advancements in neural vocoders, two sets of discriminators are employed in the GAN-based model for guitar amplifier modeling two sets of discriminators, one based on multi-scale discriminator (MSD) and the other multi-period discriminator (MPD).

Abstract

Recent years have seen increasing interest in applying deep learning methods to the modeling of guitar amplifiers or effect pedals. Existing methods are mainly based on the supervised approach, requiring temporally-aligned data pairs of unprocessed and rendered audio. However, this approach does not scale well, due to the complicated process involved in creating the data pairs. A very recent work done by Wright et al. has explored the potential of leveraging unpaired data for training, using a generative adversarial network (GAN)-based framework. This paper extends their work by using more advanced discriminators in the GAN, and using more unpaired data for training. Specifically, drawing inspiration from recent advancements in neural vocoders, we employ in our GAN-based model for guitar amplifier modeling two sets of discriminators, one based on multi-scale discriminator (MSD) and the other multi-period discriminator (MPD). Moreover, we experiment with adding unprocessed audio signals that do not have the corresponding rendered audio of a target tone to the training data, to see how much the GAN model benefits from the unpaired data. Our experiments show that the proposed two extensions contribute to the modeling of both low-gain and high-gain guitar amplifiers.

Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data

TL;DR

Drawing inspiration from recent advancements in neural vocoders, two sets of discriminators are employed in the GAN-based model for guitar amplifier modeling two sets of discriminators, one based on multi-scale discriminator (MSD) and the other multi-period discriminator (MPD).

Abstract

Recent years have seen increasing interest in applying deep learning methods to the modeling of guitar amplifiers or effect pedals. Existing methods are mainly based on the supervised approach, requiring temporally-aligned data pairs of unprocessed and rendered audio. However, this approach does not scale well, due to the complicated process involved in creating the data pairs. A very recent work done by Wright et al. has explored the potential of leveraging unpaired data for training, using a generative adversarial network (GAN)-based framework. This paper extends their work by using more advanced discriminators in the GAN, and using more unpaired data for training. Specifically, drawing inspiration from recent advancements in neural vocoders, we employ in our GAN-based model for guitar amplifier modeling two sets of discriminators, one based on multi-scale discriminator (MSD) and the other multi-period discriminator (MPD). Moreover, we experiment with adding unprocessed audio signals that do not have the corresponding rendered audio of a target tone to the training data, to see how much the GAN model benefits from the unpaired data. Our experiments show that the proposed two extensions contribute to the modeling of both low-gain and high-gain guitar amplifiers.
Paper Structure (24 sections, 5 equations, 3 figures, 4 tables)

This paper contains 24 sections, 5 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Diagram of the proposed GAN-based model for VA modeling, using clean audio that may not be matched and aligned with the target audio segment, and using two types of discriminators: MSD and MPD kong2020hifi.
  • Figure 2: Mel-spectrogram of the target audio signal, along with the ones generated by the supervised baseline 8682805 and the proposed MSD$+$MPD GAN-based model given the corresponding clean audio signal. The target audio is sampled from the test set of EGFxset pedroza2022egfxset, with BD-2 being the target tone. We see missing high-frequency harmonics from the top-right corner of the middle Mel-spectrogram, the one generated by the supervised baseline.
  • Figure 3: The mel-spectrogram between target audio , supervised approach and MSD$+$MPD sampled from the EGDB Fender test set. Both of supervised baseline and our MSD$+$MPD exhibit artifacts. These artifacts manifest as the generation of non-existent high-frequency information in the target mel-spectrogram.