Table of Contents
Fetching ...

SongSong: A Time Phonograph for Chinese SongCi Music from Thousand of Years Away

Jiajia Li, Jiliang Hu, Ziyi Pan, Chong Chen, Zuchao Li, Ping Wang, Lefei Zhang

TL;DR

The proposed model first predicts the melody from the input SongCi, then separately generates the singing voice and accompaniment based on that melody, and finally combines all elements to create the final piece of music.

Abstract

Recently, there have been significant advancements in music generation. However, existing models primarily focus on creating modern pop songs, making it challenging to produce ancient music with distinct rhythms and styles, such as ancient Chinese SongCi. In this paper, we introduce SongSong, the first music generation model capable of restoring Chinese SongCi to our knowledge. Our model first predicts the melody from the input SongCi, then separately generates the singing voice and accompaniment based on that melody, and finally combines all elements to create the final piece of music. Additionally, to address the lack of ancient music datasets, we create OpenSongSong, a comprehensive dataset of ancient Chinese SongCi music, featuring 29.9 hours of compositions by various renowned SongCi music masters. To assess SongSong's proficiency in performing SongCi, we randomly select 85 SongCi sentences that were not part of the training set for evaluation against SongSong and music generation platforms such as Suno and SkyMusic. The subjective and objective outcomes indicate that our proposed model achieves leading performance in generating high-quality SongCi music.

SongSong: A Time Phonograph for Chinese SongCi Music from Thousand of Years Away

TL;DR

The proposed model first predicts the melody from the input SongCi, then separately generates the singing voice and accompaniment based on that melody, and finally combines all elements to create the final piece of music.

Abstract

Recently, there have been significant advancements in music generation. However, existing models primarily focus on creating modern pop songs, making it challenging to produce ancient music with distinct rhythms and styles, such as ancient Chinese SongCi. In this paper, we introduce SongSong, the first music generation model capable of restoring Chinese SongCi to our knowledge. Our model first predicts the melody from the input SongCi, then separately generates the singing voice and accompaniment based on that melody, and finally combines all elements to create the final piece of music. Additionally, to address the lack of ancient music datasets, we create OpenSongSong, a comprehensive dataset of ancient Chinese SongCi music, featuring 29.9 hours of compositions by various renowned SongCi music masters. To assess SongSong's proficiency in performing SongCi, we randomly select 85 SongCi sentences that were not part of the training set for evaluation against SongSong and music generation platforms such as Suno and SkyMusic. The subjective and objective outcomes indicate that our proposed model achieves leading performance in generating high-quality SongCi music.
Paper Structure (13 sections, 12 equations, 6 figures, 3 tables)

This paper contains 13 sections, 12 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: A piece of ancient Chinese SongCi music recorded using Chinese Gongche notation, which is vertically arranged and uses special Chinese symbols as musical notes. The blue box indicates the SongCi, and the red box indicates the notes.
  • Figure 2: The relationship between lyric, rhythm, and melody.
  • Figure 3: The structure of our proposed model, SongSong. The English meaning of input SongCi is "How rare the moon,so round and clear! With cup in hand,I ask of the blue sky."
  • Figure 4: The design of the singing voice generation module.
  • Figure 5: The specific architecture of adopted accompany generation module.
  • ...and 1 more figures