Table of Contents
Fetching ...

ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis

Yunyi Liu, Craig Jin

TL;DR

This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds and creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels.

Abstract

Neural audio synthesis methods can achieve high-fidelity and realistic sound generation by utilizing deep generative models. Such models typically rely on external labels which are often discrete as conditioning information to achieve guided sound generation. However, it remains difficult to control the subtle changes in sounds without appropriate and descriptive labels, especially given a limited dataset. This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds. Our technique creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels. We further introduce an evaluation metric to explore controllability and demonstrate that our approach is effective in enabling a degree of controlled variation of different synthesized sound effects for in-domain and cross-domain sounds.

ICGAN: An implicit conditioning method for interpretable feature control of neural audio synthesis

TL;DR

This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds and creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels.

Abstract

Neural audio synthesis methods can achieve high-fidelity and realistic sound generation by utilizing deep generative models. Such models typically rely on external labels which are often discrete as conditioning information to achieve guided sound generation. However, it remains difficult to control the subtle changes in sounds without appropriate and descriptive labels, especially given a limited dataset. This paper proposes an implicit conditioning method for neural audio synthesis using generative adversarial networks that allows for interpretable control of the acoustic features of synthesized sounds. Our technique creates a continuous conditioning space that enables timbre manipulation without relying on explicit labels. We further introduce an evaluation metric to explore controllability and demonstrate that our approach is effective in enabling a degree of controlled variation of different synthesized sound effects for in-domain and cross-domain sounds.
Paper Structure (16 sections, 7 equations, 4 figures, 1 table)

This paper contains 16 sections, 7 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Proposed model architecture. The model is composed of a CNN encoder classifier, an RNN generator, and a CNN discriminator. Yellow boxes are part of the neural networks. Blue boxes indicate training inputs and outputs, while greens represent explicit variables.
  • Figure 2: Visualization of the loss trend in our ICGAN model. We generally see a convergence after 300k iterations when all three losses were shown stable. The regularizer loss quickly converges while discriminator and generator loss oscillates around -0.5 and 4 respectively.
  • Figure 3: Effect of interpolating points in the conditioning space. We randomly selected four combinations of interpolation between two classes in each sound category. The horizontal axis represents the interpolation points between two targets, with one increasing from 0 to 1 and the other from 1 to 0. The vertical axis represents the probability for that class returned by a trained classifier.
  • Figure 4: Generated in-class sounds. The models are trained in the same footsteps category with different variations denoted by its property.