Table of Contents
Fetching ...

ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

Xia Liang, Xingjian Du, Jiaju Lin, Pei Zou, Yuan Wan, Bilei Zhu

Abstract

Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Evaluation and Modification - Aesthetic Selection". This framework seamlessly blends the interactive and knowledge-understanding features of LLMs with existing symbolic music generation models, thereby achieving a melody composition agent comparable to human creators. We conduct extensive experiments on GPT4 and several open-source large language models, which substantiate our framework's effectiveness. Furthermore, professional music composers were engaged in multi-dimensional evaluations, the final results demonstrated that across various facets of music composition, ByteComposer agent attains the level of a novice melody composer.

ByteComposer: a Human-like Melody Composition Method based on Language Model Agent

Abstract

Large Language Models (LLM) have shown encouraging progress in multimodal understanding and generation tasks. However, how to design a human-aligned and interpretable melody composition system is still under-explored. To solve this problem, we propose ByteComposer, an agent framework emulating a human's creative pipeline in four separate steps : "Conception Analysis - Draft Composition - Self-Evaluation and Modification - Aesthetic Selection". This framework seamlessly blends the interactive and knowledge-understanding features of LLMs with existing symbolic music generation models, thereby achieving a melody composition agent comparable to human creators. We conduct extensive experiments on GPT4 and several open-source large language models, which substantiate our framework's effectiveness. Furthermore, professional music composers were engaged in multi-dimensional evaluations, the final results demonstrated that across various facets of music composition, ByteComposer agent attains the level of a novice melody composer.
Paper Structure (22 sections, 4 figures, 5 tables, 1 algorithm)

This paper contains 22 sections, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: Overview of the ByteComposer System. From raw textual input, the system's Expert Module first analyzes emotional sentiment and thematic context, extracting primary features. This information is then passed to the Generator Module, which employs deep learning techniques to transform these textual features into initial musical motifs, leveraging knowledge from MIR. The Voter Module subsequently refines and evaluates the generated motifs, cross-referencing with historical compositions and utilizing real-time feedback loops. Conclusively, the Memory Module archives successful motifs, allowing the system to continuously learn and update its database, influencing future compositions. This seamless integration of modules ensures the creation of musically coherent and emotionally resonant pieces tailored to the input.
  • Figure 2: The interplay between the system's components. Input text is processed leveraging music theory knowledge, common sense, and context. This data then informs the Expert and Voter modules, facilitated by their respective evaluation toolboxes. The Expert module focuses on understanding and providing feedback on the creation, while the Voter module sorts the potential musical candidates.
  • Figure 3: The Memory Module of ByteComposer.
  • Figure 4: Diverse agent configurations are enabled through the combination of different functional modules, further augmented by the support for custom model module components. This modularity not only facilitates a tailored approach to specific tasks but also enhances the system's adaptability to evolving requirements and novel applications.