Table of Contents
Fetching ...

Agent-Driven Large Language Models for Mandarin Lyric Generation

Hong-Hsiang Liu, Yi-Wen Liu

TL;DR

A multi-agent system that decomposes the melody-to-lyric task into sub-tasks, with each agent controlling rhyme, syllable count, lyric-melody align-ment, and consistency is developed.

Abstract

Generative Large Language Models have shown impressive in-context learning abilities, performing well across various tasks with just a prompt. Previous melody-to-lyric research has been limited by scarce high-quality aligned data and unclear standard for creativeness. Most efforts focused on general themes or emotions, which are less valuable given current language model capabilities. In tonal contour languages like Mandarin, pitch contours are influenced by both melody and tone, leading to variations in lyric-melody fit. Our study, validated by the Mpop600 dataset, confirms that lyricists and melody writers consider this fit during their composition process. In this research, we developed a multi-agent system that decomposes the melody-to-lyric task into sub-tasks, with each agent controlling rhyme, syllable count, lyric-melody alignment, and consistency. Listening tests were conducted via a diffusion-based singing voice synthesizer to evaluate the quality of lyrics generated by different agent groups.

Agent-Driven Large Language Models for Mandarin Lyric Generation

TL;DR

A multi-agent system that decomposes the melody-to-lyric task into sub-tasks, with each agent controlling rhyme, syllable count, lyric-melody align-ment, and consistency is developed.

Abstract

Generative Large Language Models have shown impressive in-context learning abilities, performing well across various tasks with just a prompt. Previous melody-to-lyric research has been limited by scarce high-quality aligned data and unclear standard for creativeness. Most efforts focused on general themes or emotions, which are less valuable given current language model capabilities. In tonal contour languages like Mandarin, pitch contours are influenced by both melody and tone, leading to variations in lyric-melody fit. Our study, validated by the Mpop600 dataset, confirms that lyricists and melody writers consider this fit during their composition process. In this research, we developed a multi-agent system that decomposes the melody-to-lyric task into sub-tasks, with each agent controlling rhyme, syllable count, lyric-melody alignment, and consistency. Listening tests were conducted via a diffusion-based singing voice synthesizer to evaluate the quality of lyrics generated by different agent groups.
Paper Structure (14 sections, 3 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 14 sections, 3 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Forward Control and Backward Control for LLMs
  • Figure 2: Backward control process
  • Figure 3: The flow of lyric generation by angents
  • Figure 4: Distribution of NSMR and SMR in Mpop600.
  • Figure 5: Average rank of different agent groups, as reported by 22 listeners (lower is better)