Table of Contents
Fetching ...

Sing it, Narrate it: Quality Musical Lyrics Translation

Zhuorui Ye, Jinhan Li, Rongwu Xu

TL;DR

This paper introduces an inference-time optimization framework for translating entire songs and demonstrates significant improvements over baseline methods and validate the effectiveness of each component in this approach.

Abstract

Translating lyrics for musicals presents unique challenges due to the need to ensure high translation quality while adhering to singability requirements such as length and rhyme. Existing song translation approaches often prioritize these singability constraints at the expense of translation quality, which is crucial for musicals. This paper aims to enhance translation quality while maintaining key singability features. Our method consists of three main components. First, we create a dataset to train reward models for the automatic evaluation of translation quality. Second, to enhance both singability and translation quality, we implement a two-stage training process with filtering techniques. Finally, we introduce an inference-time optimization framework for translating entire songs. Extensive experiments, including both automatic and human evaluations, demonstrate significant improvements over baseline methods and validate the effectiveness of each component in our approach.

Sing it, Narrate it: Quality Musical Lyrics Translation

TL;DR

This paper introduces an inference-time optimization framework for translating entire songs and demonstrates significant improvements over baseline methods and validate the effectiveness of each component in this approach.

Abstract

Translating lyrics for musicals presents unique challenges due to the need to ensure high translation quality while adhering to singability requirements such as length and rhyme. Existing song translation approaches often prioritize these singability constraints at the expense of translation quality, which is crucial for musicals. This paper aims to enhance translation quality while maintaining key singability features. Our method consists of three main components. First, we create a dataset to train reward models for the automatic evaluation of translation quality. Second, to enhance both singability and translation quality, we implement a two-stage training process with filtering techniques. Finally, we introduce an inference-time optimization framework for translating entire songs. Extensive experiments, including both automatic and human evaluations, demonstrate significant improvements over baseline methods and validate the effectiveness of each component in our approach.

Paper Structure

This paper contains 22 sections, 4 equations, 10 figures, 14 tables.

Figures (10)

  • Figure 1: Aspects we considered include length, rhyme, and translation quality. The proper length of translated lyrics is the number of notes, and the end rhyme of each line (shown in parentheses) is better to have the same type (shown in the same color). Google translation fails to follow the length constraint and misaligns with music, as shown in red boxes, and its rhyme does not match. Both the baseline and our results meet length and rhyme constraints, but the baseline has inaccurate translations and inappropriate phrases, while our model generates higher-quality lyrics.
  • Figure 2: Overview of our pipeline. There are three key components in our method: reward model training (top left), translation model two-stage training (top right), and inference-time optimization framework (bottom). We use reward models to filter the whole corpora into a Quality subset and a High-Quality subset and train our generation model with the Q set and then with the HQ set. During inference, we generate plenty of sentence-level translations and derive paragraph-level translations by optimizing the loss function considering various aspects. We additionally give a 2nd pass with the same process but generate more sentence translations conditioned on the best rhyme.
  • Figure 3: The changes of length accuracy, rhyme score, both basic and advanced translation quality, and COMET score if we change the training set scale.
  • Figure 4: The distribution of musicals in MusicalTransEval dataset (a) and musical testing dataset (b).
  • Figure 5: Metrics for human labeling, page 1/3.
  • ...and 5 more figures