Analyzing Byte-Pair Encoding on Monophonic and Polyphonic Symbolic Music: A Focus on Musical Phrase Segmentation

Dinh-Viet-Toan Le; Louis Bigo; Mikaela Keller

Analyzing Byte-Pair Encoding on Monophonic and Polyphonic Symbolic Music: A Focus on Musical Phrase Segmentation

Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller

TL;DR

The findings show that the BPE training process is highly dependent on the instrumentation and that BPE “supertokens” succeed in capturing abstract musical content, and in a musical phrase segmentation task, BPE notably improves performance in a polyphonic setting, but enhances performance in monophonic tunes only within a specific range of BPE merges.

Abstract

Byte-Pair Encoding (BPE) is an algorithm commonly used in Natural Language Processing to build a vocabulary of subwords, which has been recently applied to symbolic music. Given that symbolic music can differ significantly from text, particularly with polyphony, we investigate how BPE behaves with different types of musical content. This study provides a qualitative analysis of BPE's behavior across various instrumentations and evaluates its impact on a musical phrase segmentation task for both monophonic and polyphonic music. Our findings show that the BPE training process is highly dependent on the instrumentation and that BPE "supertokens" succeed in capturing abstract musical content. In a musical phrase segmentation task, BPE notably improves performance in a polyphonic setting, but enhances performance in monophonic tunes only within a specific range of BPE merges.

Analyzing Byte-Pair Encoding on Monophonic and Polyphonic Symbolic Music: A Focus on Musical Phrase Segmentation

TL;DR

Abstract

Paper Structure (9 sections, 4 figures)

This paper contains 9 sections, 4 figures.

Introduction
Subword tokenization in symbolic music
Analyzing music BPE
Comparing text and music BPEs
Musical content carried by supertokens
Evaluating BPE on musical phrase segmentation
Musical phrase segmentation
Experiments
Conclusion

Figures (4)

Figure 1: (Top) Frequency of the created supertokens through the vocab size increasing with the BPE steps, for different styles of music and multilingual text data. (Bottom) Average length of already created supertokens through BPE iterations for musical and text data. The initial vocabulary size of each tokenization is indicated.
Figure 2: (Top) First most common start-of-phrase supertoken from Mozart's K.25 and Beethoven's WoO.68. (Bottom) 9-long common ending supertoken (10th most common) from Beethoven's WoO.73 and Mozart's K.179. The tokenization is Structured + intervals.
Figure 3: f1-score for start-of-phrase classification on the polyphonic (top) and monophonic dataset (bottom).
Figure 4: Ratio of supertokens containing $n$<Pitch> atomic elements in the vocabulary for each number of BPE merges.

Analyzing Byte-Pair Encoding on Monophonic and Polyphonic Symbolic Music: A Focus on Musical Phrase Segmentation

TL;DR

Abstract

Analyzing Byte-Pair Encoding on Monophonic and Polyphonic Symbolic Music: A Focus on Musical Phrase Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)