Table of Contents
Fetching ...

PitchFlower: A flow-based neural audio codec with pitch controllability

Diego Torres, Axel Roebel, Nicolas Obin

TL;DR

P PitchFlower is presented, a flow-based neural audio codec with explicit pitch controllability and beyond, this framework provides a simple and extensible path toward disentangling other speech attributes.

Abstract

We present PitchFlower, a flow-based neural audio codec with explicit pitch controllability. Our approach enforces disentanglement through a simple perturbation: during training, F0 contours are flattened and randomly shifted, while the true F0 is provided as conditioning. A vector-quantization bottleneck prevents pitch recovery, and a flow-based decoder generates high quality audio. Experiments show that PitchFlower achieves more accurate pitch control than WORLD at much higher audio quality, and outperforms SiFiGAN in controllability while maintaining comparable quality. Beyond pitch, this framework provides a simple and extensible path toward disentangling other speech attributes.

PitchFlower: A flow-based neural audio codec with pitch controllability

TL;DR

P PitchFlower is presented, a flow-based neural audio codec with explicit pitch controllability and beyond, this framework provides a simple and extensible path toward disentangling other speech attributes.

Abstract

We present PitchFlower, a flow-based neural audio codec with explicit pitch controllability. Our approach enforces disentanglement through a simple perturbation: during training, F0 contours are flattened and randomly shifted, while the true F0 is provided as conditioning. A vector-quantization bottleneck prevents pitch recovery, and a flow-based decoder generates high quality audio. Experiments show that PitchFlower achieves more accurate pitch control than WORLD at much higher audio quality, and outperforms SiFiGAN in controllability while maintaining comparable quality. Beyond pitch, this framework provides a simple and extensible path toward disentangling other speech attributes.

Paper Structure

This paper contains 14 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Architecture and training methodology of PitchFlower.
  • Figure 2: Objective comparison of different disentanglement strategies.
  • Figure 3: Objective evaluation results comparing PitchFlower with baselines. An alternative version of our model, PitchFlowerUV, is also considered.
  • Figure 4: Pitch control for different sizes and types of bottleneck
  • Figure 5: (a) UTMOS score when changing the number of flow steps. (b) UTMOS and F0RMSE curves for different values of the CFG scale.