An Agent-Based Framework for Automated Higher-Voice Harmony Generation
Nia D'Souza Ganapathy, Arul Selvamani Shaja
TL;DR
Automating harmony generation is a complex problem requiring balance among theory, style, and musical expressivity. The authors present a multi-agent framework that decomposes the task into four specialized roles—Librarian, Theorist, Composer, and Conductor—driven by core models: Chord-Former, Harmony-GPT, Rhythm-Net, and a GAN-based Symbolic-to-Audio Synthesizer, collectively forming an end-to-end pipeline with stateful data flow $S_{XML}\rightarrow S_{std} \rightarrow S_{harm} \rightarrow W_{audio}$. This modular architecture enhances interpretability, extensibility, and collaborative creativity, while aiming to produce musically coherent, context-aware higher-voice harmonies across genres. The framework’s data pipelines, model designs, and training regimes demonstrate a concrete, end-to-end approach to automated harmony generation that bridges symbolic composition and realistic audio rendering, with potential applications for composers and educators. The work also outlines clear avenues for future work, including real-time interactivity, hierarchical planning for long-range musical form, broader orchestration, and more diverse training data to broaden stylistic coverage.
Abstract
The generation of musically coherent and aesthetically pleasing harmony remains a significant challenge in the field of algorithmic composition. This paper introduces an innovative Agentic AI-enabled Higher Harmony Music Generator, a multi-agent system designed to create harmony in a collaborative and modular fashion. Our framework comprises four specialized agents: a Music-Ingestion Agent for parsing and standardizing input musical scores; a Chord-Knowledge Agent, powered by a Chord-Former (Transformer model), to interpret and provide the constituent notes of complex chord symbols; a Harmony-Generation Agent, which utilizes a Harmony-GPT and a Rhythm-Net (RNN) to compose a melodically and rhythmically complementary harmony line; and an Audio-Production Agent that employs a GAN-based Symbolic-to-Audio Synthesizer to render the final symbolic output into high-fidelity audio. By delegating specific tasks to specialized agents, our system effectively mimics the collaborative process of human musicians. This modular, agent-based approach allows for robust data processing, deep theoretical understanding, creative composition, and realistic audio synthesis, culminating in a system capable of generating sophisticated and contextually appropriate higher-voice harmonies for given melodies.
