Table of Contents
Fetching ...

Evaluating Interval-based Tokenization for Pitch Representation in Symbolic Music Analysis

Dinh-Viet-Toan Le, Louis Bigo, Mikaela Keller

TL;DR

Absolute-pitch tokenizations can obscure relational pitch information and limit modeling of musical structure; the paper introduces a general intervalization framework that converts absolute pitches to relative interval tokens using a chosen reference sequence. It formalizes the method with I_ref and I_non_ref encodings across six intervalization strategies, and evaluates seven variants using Transformer encoders on three downstream MIR tasks with end-to-end and pre-trained settings. Intervalization improves performance across tasks and provides musically meaningful interpretability, with task-dependent best-reference choices. The work advances pitch representation in symbolic music analysis and points to extensions such as interval classes and broader reference choices for generation and analysis.

Abstract

Symbolic music analysis tasks are often performed by models originally developed for Natural Language Processing, such as Transformers. Such models require the input data to be represented as sequences, which is achieved through a process of tokenization. Tokenization strategies for symbolic music often rely on absolute MIDI values to represent pitch information. However, music research largely promotes the benefit of higher-level representations such as melodic contour and harmonic relations for which pitch intervals turn out to be more expressive than absolute pitches. In this work, we introduce a general framework for building interval-based tokenizations. By evaluating these tokenizations on three music analysis tasks, we show that such interval-based tokenizations improve model performances and facilitate their explainability.

Evaluating Interval-based Tokenization for Pitch Representation in Symbolic Music Analysis

TL;DR

Absolute-pitch tokenizations can obscure relational pitch information and limit modeling of musical structure; the paper introduces a general intervalization framework that converts absolute pitches to relative interval tokens using a chosen reference sequence. It formalizes the method with I_ref and I_non_ref encodings across six intervalization strategies, and evaluates seven variants using Transformer encoders on three downstream MIR tasks with end-to-end and pre-trained settings. Intervalization improves performance across tasks and provides musically meaningful interpretability, with task-dependent best-reference choices. The work advances pitch representation in symbolic music analysis and points to extensions such as interval classes and broader reference choices for generation and analysis.

Abstract

Symbolic music analysis tasks are often performed by models originally developed for Natural Language Processing, such as Transformers. Such models require the input data to be represented as sequences, which is achieved through a process of tokenization. Tokenization strategies for symbolic music often rely on absolute MIDI values to represent pitch information. However, music research largely promotes the benefit of higher-level representations such as melodic contour and harmonic relations for which pitch intervals turn out to be more expressive than absolute pitches. In this work, we introduce a general framework for building interval-based tokenizations. By evaluating these tokenizations on three music analysis tasks, we show that such interval-based tokenizations improve model performances and facilitate their explainability.
Paper Structure (12 sections, 10 equations, 5 figures, 2 tables)

This paper contains 12 sections, 10 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Representations of the sheet music based on absolute and different variants of intervalization of the REMI tokenization. (Abs.: Absolute pitch encoding)
  • Figure 2: Performance comparison between absolute and intervalized tokenization strategies on the three downstream tasks with non pre-trained and pre-trained models. The intervalized model is based on the reference resulting in the best performance.
  • Figure 3: Count of best intervalization references when comparing intervalized models with various references and the absolute. For each task, we consider two pre-trained and two end-to-end models trained on the tokenizations shown in Table \ref{['tab:tokenizations_description']}. This results in 12 comparisons by task, where each comparison involves three intervalization references tested against an absolute tokenization.
  • Figure 4: Histograms of vertical pitch interval tokens predicted as root position, first, second or third inversion. The tokenizer is a REMI intervalized tokenizer with the reference being the bottom-line encoded as absolute pitches and non-reference events encoded using vertical pitch intervals. The hatched part of a bar represents the proportion of false positives. Red highlights indicate the intervals that occur in each chord inversion. The notation (+1) indicates an additional octave.
  • Figure 5: Examples of intervalized tokenizations based on interval classes instead of pitch intervals. (Abs.: Absolute pitch encoding)