K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling

Haven Kim; Jongmin Jung; Dasaem Jeong; Juhan Nam

K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling

Haven Kim, Jongmin Jung, Dasaem Jeong, Juhan Nam

TL;DR

This work tackles the lack of public data for lyric translation, with a focus on K-pop, by introducing a Korean–English singable lyric dataset of about 1000 songs aligned line-by-line and section-by-section. It analyzes semantic and phonetic characteristics of K-pop translations, revealing strong section-wise semantic relationships and distinctive phoneme repetition, and presents a Transformer-based Lyric Translation model trained on the dataset. The results show that incorporating explicit syllable-count tokens ($<$SYL$>$) improves syllable alignment and enables generation of singable translations without musical input, highlighting the dataset’s value for linguistic analysis and practical lyric generation. Overall, the dataset and methods provide a foundation for cross-genre lyric translation research and music localization applications.

Abstract

Lyric translation, a field studied for over a century, is now attracting computational linguistics researchers. We identified two limitations in previous studies. Firstly, lyric translation studies have predominantly focused on Western genres and languages, with no previous study centering on K-pop despite its popularity. Second, the field of lyric translation suffers from a lack of publicly available datasets; to the best of our knowledge, no such dataset exists. To broaden the scope of genres and languages in lyric translation studies, we introduce a novel singable lyric translation dataset, approximately 89\% of which consists of K-pop song lyrics. This dataset aligns Korean and English lyrics line-by-line and section-by-section. We leveraged this dataset to unveil unique characteristics of K-pop lyric translation, distinguishing it from other extensively studied genres, and to construct a neural lyric translation model, thereby underscoring the importance of a dedicated dataset for singable lyric translations.

K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling

TL;DR

SYL

) improves syllable alignment and enables generation of singable translations without musical input, highlighting the dataset’s value for linguistic analysis and practical lyric generation. Overall, the dataset and methods provide a foundation for cross-genre lyric translation research and music localization applications.

Abstract

Paper Structure (19 sections, 4 equations, 5 figures, 8 tables)

This paper contains 19 sections, 4 equations, 5 figures, 8 tables.

Introduction
Dataset
Source Corpora Collection
Human Alignment
Unpacking K-pop Translation
Semantic Pattern
Phoneme Repetition Pattern
Neural K-pop Translation
Training
Data Preprocessing
Evaluation Metrics
Quantitative Results
Qualitative Results
Conclusions
Ethics Statement
...and 4 more sections

Figures (5)

Figure 1: An illustration of K-pop translation, featuring "ID Peace B" by BoA, with English singable lyrics, Korean singable lyrics, and their corresponding non-singable English translations.
Figure 2: An example of the alignment task, using "Beautiful" by Amber, along with its English and Korean lyrics and their alignments with syllable counts in each language. We obtained the syllable counts by employing the syllables library for English. For Korean, we simply counted the number of characters, as each character in Korean corresponds to one syllable.
Figure 3: Density plots showing the distribution of line-by-line semantic textual similarity ($sts$) between the English and Korean lyrics for K-pop songs, considering both instances where untranslated English lyrics are included and where they are excluded, animated musical songs, and theatre songs.
Figure 4: Data utilization order during the training phase
Figure 5: Automatic translations of "In & Out" by Red Velvet, generated by the baseline model junczys2018marian as well as the semi-supervised and fine-tuned models provided with the original lyrics, their pronunciation, and meanings for comparison. When the syllable count of the generated lyrics exceeds the target count, two or more syllables are put under one note considered to be easily arranged by a music expert. When the syllable count of the generated lyrics is less than the target count, one or more notes, considered "musically removable", are not accompanied by lyrics in the score.

K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling

TL;DR

Abstract

K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling

Authors

TL;DR

Abstract

Table of Contents

Figures (5)