Table of Contents
Fetching ...

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Xueyao Zhang, Liumeng Xue, Yicheng Gu, Yuancheng Wang, Jiaqi Li, Haorui He, Chaoren Wang, Songting Liu, Xi Chen, Junan Zhang, Zihao Fang, Haopeng Chen, Tze Ying Tang, Lexiao Zou, Mingxuan Wang, Jun Han, Kai Chen, Haizhou Li, Zhizheng Wu

TL;DR

This paper presents a high-level overview of Amphion, an open-source toolkit for Audio, Music, and Speech Generation that presents a unified framework that includes diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation.

Abstract

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that includes diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion. Amphion is open-sourced at https://github.com/open-mmlab/Amphion.

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

TL;DR

This paper presents a high-level overview of Amphion, an open-source toolkit for Audio, Music, and Speech Generation that presents a unified framework that includes diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation.

Abstract

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to ease the way for junior researchers and engineers into these fields. It presents a unified framework that includes diverse generation tasks and models, with the added bonus of being easily extendable for new incorporation. The toolkit is designed with beginner-friendly workflows and pre-trained models, allowing both beginners and seasoned researchers to kick-start their projects with relative ease. The initial release of Amphion v0.1 supports a range of tasks including Text to Speech (TTS), Text to Audio (TTA), and Singing Voice Conversion (SVC), supplemented by essential components like data preprocessing, state-of-the-art vocoders, and evaluation metrics. This paper presents a high-level overview of Amphion. Amphion is open-sourced at https://github.com/open-mmlab/Amphion.
Paper Structure (14 sections, 2 figures, 8 tables)

This paper contains 14 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: The north-star goal of Amphion toolkit: "Any to Audio".
  • Figure 2: System architecture design of Amphion.