PodAgent: A Comprehensive Framework for Podcast Generation

Yujia Xiao; Lei He; Haohan Guo; Fenglong Xie; Tan Lee

PodAgent: A Comprehensive Framework for Podcast Generation

Yujia Xiao, Lei He, Haohan Guo, Fenglong Xie, Tan Lee

TL;DR

PodAgent tackles automated podcast-like audio generation by integrating a Host-Guest-Writer multi-agent framework, a voice-role matching system, and LLM-guided instruction-following speech synthesis to produce content-rich, expressive long-form dialogue. It introduces a comprehensive evaluation protocol for open-ended podcast content and voice quality, and demonstrates substantial improvements over direct GPT-4 generation in topic-discussion content and voice alignment, achieving an 87.4% voice-matching accuracy. The approach enables end-to-end production from topic to audio, with validated benefits in dialogue diversity, informativeness, and expressiveness, while acknowledging limitations in voice pool size and acoustic realism. Overall, PodAgent offers a scalable blueprint for automatic podcast creation with practical implications for content automation, voice replication ethics, and AI-assisted media production.

Abstract

Existing Existing automatic audio generation methods struggle to generate podcast-like audio programs effectively. The key challenges lie in in-depth content generation, appropriate and expressive voice production. This paper proposed PodAgent, a comprehensive framework for creating audio programs. PodAgent 1) generates informative topic-discussion content by designing a Host-Guest-Writer multi-agent collaboration system, 2) builds a voice pool for suitable voice-role matching and 3) utilizes LLM-enhanced speech synthesis method to generate expressive conversational speech. Given the absence of standardized evaluation criteria for podcast-like audio generation, we developed comprehensive assessment guidelines to effectively evaluate the model's performance. Experimental results demonstrate PodAgent's effectiveness, significantly surpassing direct GPT-4 generation in topic-discussion dialogue content, achieving an 87.4% voice-matching accuracy, and producing more expressive speech through LLM-guided synthesis. Demo page: https://podcast-agent.github.io/demo/. Source code: https://github.com/yujxx/PodAgent.

PodAgent: A Comprehensive Framework for Podcast Generation

TL;DR

Abstract

PodAgent: A Comprehensive Framework for Podcast Generation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)