Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

Sang-Hoon Lee; Ha-Yeong Choi; Seong-Whan Lee

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

Sang-Hoon Lee, Ha-Yeong Choi, Seong-Whan Lee

TL;DR

This work addresses the slow sampling and limited high-frequency fidelity of conditional flow matching (CFM) for waveform generation. It introduces PeriodWave-Turbo, which finetunes a pre-trained CFM generator into a fixed-step, few-step ODE sampler using reconstruction losses and adversarial feedback, achieving state-of-the-art objective and subjective scores with 2–4 inference steps and a 1,000-step fine-tuning regime. Key innovations include a fixed-step generator, a multi-term loss combining adversarial, reconstruction, and feature-matching objectives, and a study across model sizes (S/B/L) that scales performance while maintaining efficiency. The approach delivers substantial speedups, improves PESQ on LibriTTS to around $4.454$, and demonstrates robustness in OOD and two-stage TTS scenarios, with plans to release code and checkpoints for reproducibility and broader impact.

Abstract

This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficient waveform generation model via adversarial flow matching optimization. Recently, conditional flow matching (CFM) generative models have been successfully adopted for waveform generation tasks, leveraging a single vector field estimation objective for training. Although these models can generate high-fidelity waveform signals, they require significantly more ODE steps compared to GAN-based models, which only need a single generation step. Additionally, the generated samples often lack high-frequency information due to noisy vector field estimation, which fails to ensure high-frequency reproduction. To address this limitation, we enhance pre-trained CFM-based generative models by incorporating a fixed-step generator modification. We utilized reconstruction losses and adversarial feedback to accelerate high-fidelity waveform generation. Through adversarial flow matching optimization, it only requires 1,000 steps of fine-tuning to achieve state-of-the-art performance across various objective metrics. Moreover, we significantly reduce inference speed from 16 steps to 2 or 4 steps. Additionally, by scaling up the backbone of PeriodWave from 29M to 70M parameters for improved generalization, PeriodWave-Turbo achieves unprecedented performance, with a perceptual evaluation of speech quality (PESQ) score of 4.454 on the LibriTTS dataset. Audio samples, source code and checkpoints will be available at https://github.com/sh-lee-prml/PeriodWave.

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

TL;DR

, and demonstrates robustness in OOD and two-stage TTS scenarios, with plans to release code and checkpoints for reproducibility and broader impact.

Abstract

Paper Structure (31 sections, 4 equations, 1 figure, 9 tables)

This paper contains 31 sections, 4 equations, 1 figure, 9 tables.

Introduction
Related Works
Accelerating Methods for Few-Step Generator
Adversarial Feedback for Waveform Generation
PeriodWave-Turbo
Flow Matching for Waveform Generation
Adversarial Flow Matching Optimization
Few-step Generator Modification
Reconstruction Loss
Adversarial Training
Distillation Method
Final Loss
Model Size
Experiment and Result
Dataset
...and 16 more sections

Figures (1)

Figure 1: Overall architiecture of PeriodWave-Turbo. We initialize the parameter of PeriodWave-Turbo by the pre-trained PeriodWave which was trained with flow matching objective. Then, PeriodWave-Turbo is modified by few-step generator with fixed steps. PeriodWave-Turbo is trained with reconstruction Loss and adversarial feedback. Compared to fully GAN training, this could accelerate the training time about $6\times$ faster even with much better performance.

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

TL;DR

Abstract

Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization

Authors

TL;DR

Abstract

Table of Contents

Figures (1)