Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Yixiao Zhang, Akira Maezawa, Gus Xia, Kazuhiko Yamamoto, Simon Dixon
TL;DR
The paper addresses the challenge of orchestrating multiple AI subsystems for iterative music creation by introducing Loop Copilot, an LLM-driven controller that selects and chains specialized backends while maintaining musical coherence via a Global Attribute Table (GAT). It formalizes the interaction with a stateful framework, demonstrates a two-stage generation/editing workflow, and presents a training-free method for iterative editing through model chaining. An empirical evaluation with eight participants using SUS and TAM reveals generally favorable usability and acceptance, while qualitative feedback identifies limitations in control granularity and integration with existing workflows. The work highlights the potential of conversational, multi-model orchestration to democratize music creation and suggests future work on richer editing tasks, DAW integration, and voice-based interactions to broaden impact.
Abstract
Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpret user intentions and select appropriate AI models for task execution. Each backend model is specialized for a specific task, and their outputs are aggregated to meet the user's requirements. To ensure musical coherence, essential attributes are maintained in a centralized table. We evaluate the effectiveness of the proposed system through semi-structured interviews and questionnaires, highlighting its utility not only in facilitating music creation but also its potential for broader applications.
