Table of Contents
Fetching ...

SRC-gAudio: Sampling-Rate-Controlled Audio Generation

Chenxing Li, Manjie Xu, Dong Yu

TL;DR

SRC-gAudio is introduced, a novel audio generation model designed to facilitate text-to-audio generation across a wide range of sampling rates within a single model architecture that enables the generation of audio at multiple sampling rates with a single unified model.

Abstract

We introduce SRC-gAudio, a novel audio generation model designed to facilitate text-to-audio generation across a wide range of sampling rates within a single model architecture. SRC-gAudio incorporates the sampling rate as part of the generation condition to guide the diffusion-based audio generation process. Our model enables the generation of audio at multiple sampling rates with a single unified model. Furthermore, we explore the potential benefits of large-scale, low-sampling-rate data in enhancing the generation quality of high-sampling-rate audio. Through extensive experiments, we demonstrate that SRC-gAudio effectively generates audio under controlled sampling rates. Additionally, our results indicate that pre-training on low-sampling-rate data can lead to significant improvements in audio quality across various metrics.

SRC-gAudio: Sampling-Rate-Controlled Audio Generation

TL;DR

SRC-gAudio is introduced, a novel audio generation model designed to facilitate text-to-audio generation across a wide range of sampling rates within a single model architecture that enables the generation of audio at multiple sampling rates with a single unified model.

Abstract

We introduce SRC-gAudio, a novel audio generation model designed to facilitate text-to-audio generation across a wide range of sampling rates within a single model architecture. SRC-gAudio incorporates the sampling rate as part of the generation condition to guide the diffusion-based audio generation process. Our model enables the generation of audio at multiple sampling rates with a single unified model. Furthermore, we explore the potential benefits of large-scale, low-sampling-rate data in enhancing the generation quality of high-sampling-rate audio. Through extensive experiments, we demonstrate that SRC-gAudio effectively generates audio under controlled sampling rates. Additionally, our results indicate that pre-training on low-sampling-rate data can lead to significant improvements in audio quality across various metrics.

Paper Structure

This paper contains 16 sections, 4 equations, 1 figure, 2 tables.

Figures (1)

  • Figure 1: The overview of the proposed method.