SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition

Shuangrui Ding; Zihan Liu; Xiaoyi Dong; Pan Zhang; Rui Qian; Junhao Huang; Conghui He; Dahua Lin; Jiaqi Wang

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition

Shuangrui Ding, Zihan Liu, Xiaoyi Dong, Pan Zhang, Rui Qian, Junhao Huang, Conghui He, Dahua Lin, Jiaqi Wang

TL;DR

SongComposer presents a unified large language model for simultaneous lyric and melody generation in symbolic form. It introduces a word-level lyric-melody tuple representation, a scalar pitch initialization strategy, and a three-stage, structure-aware training pipeline to encode motif- and phrase-level song organization. The authors assemble SongCompose, a large bilingual dataset with precise lyric-melody alignments, and demonstrate that SongComposer outperforms GPT-4 and other baselines on lyric-to-melody, melody-to-lyrics, song continuation, and text-to-song tasks, supported by extensive ablations. Limitations are acknowledged regarding audio synthesis and multi-track accompaniment, with future work proposed to bridge symbolic and acoustic generation for end-to-end text-to-song production.

Abstract

Creating lyrics and melodies for the vocal track in a symbolic format, known as song composition, demands expert musical knowledge of melody, an advanced understanding of lyrics, and precise alignment between them. Despite achievements in sub-tasks such as lyric generation, lyric-to-melody, and melody-to-lyric, etc, a unified model for song composition has not yet been achieved. In this paper, we introduce SongComposer, a pioneering step towards a unified song composition model that can readily create symbolic lyrics and melodies following instructions. SongComposer is a music-specialized large language model (LLM) that, for the first time, integrates the capability of simultaneously composing lyrics and melodies into LLMs by leveraging three key innovations: 1) a flexible tuple format for word-level alignment of lyrics and melodies, 2) an extended tokenizer vocabulary for song notes, with scalar initialization based on musical knowledge to capture rhythm, and 3) a multi-stage pipeline that captures musical structure, starting with motif-level melody patterns and progressing to phrase-level structure for improved coherence. Extensive experiments demonstrate that SongComposer outperforms advanced LLMs, including GPT-4, in tasks such as lyric-to-melody generation, melody-to-lyric generation, song continuation, and text-to-song creation. Moreover, we will release SongCompose, a large-scale dataset for training, containing paired lyrics and melodies in Chinese and English.

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition

TL;DR

Abstract

Paper Structure (29 sections, 7 equations, 15 figures, 8 tables)

This paper contains 29 sections, 7 equations, 15 figures, 8 tables.

Introduction
Related Work
SongComposer
Symbolic Representation for LLMs
Pitch Initialization
Progressive Structure-aware Training
Experiments
SongCompose Dataset
Training Details
Evaluation Setup
Experimental Results
Ablation Study
Conclusion
SongCompose Dataset
Pure-lyric Dataset
...and 14 more sections

Figures (15)

Figure 1: Overview of the song-related instruction-following composition by SongComposer. SongComposer utilizes symbolic song representation to compose melodies tailored to lyrics, craft lyrics to complement melodies, extend existing songs, and generate new songs from textual prompts.
Figure 2: (a) Symbolic song representation involves precise alignment of notes and lyrics; (b) The structure of a song often comprises motif-level and phrase-level concepts.
Figure 3: Visualization of attention distribution for different key/query types.
Figure 4: Memorization analysis of SongComposer.
Figure 5: Pipeline of paired lyric-melody data collection.
...and 10 more figures

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition

TL;DR

Abstract

SongComposer: A Large Language Model for Lyric and Melody Generation in Song Composition

Authors

TL;DR

Abstract

Table of Contents

Figures (15)