Table of Contents
Fetching ...

The GigaMIDI Dataset with Features for Expressive Music Performance Detection

Keon Ju Maverick Lee, Jeff Ens, Sara Adkins, Pedro Sarmento, Mathieu Barthet, Philippe Pasquier

TL;DR

This work addresses the challenge of distinguishing expressive versus non-expressive MIDI performances at track level by leveraging a massive symbolic music corpus, GigaMIDI, and a set of novel heuristics. The authors introduce three expressiveness detectors—DNVR, DNODR, and NOMML—with NOMML showing perfect separation on ground-truth data and enabling the creation of a large expressiveness subset (1,655,649 tracks). They provide extensive dataset statistics, standardized preprocessing, and a public HuggingFace release to support MIR and symbolic music research. The work highlights practical impacts for symbolic music generation, data mining, and digital musicology, while acknowledging biases and limitations in ground-truth coverage and instrument representation. Overall, NOMML emerges as a robust, scalable metric for expressive performance detection across GM instruments, facilitating future studies in expressive generation and analysis within large symbolic corpora.

Abstract

The Musical Instrument Digital Interface (MIDI), introduced in 1983, revolutionized music production by allowing computers and instruments to communicate efficiently. MIDI files encode musical instructions compactly, facilitating convenient music sharing. They benefit Music Information Retrieval (MIR), aiding in research on music understanding, computational musicology, and generative music. The GigaMIDI dataset contains over 1.4 million unique MIDI files, encompassing 1.8 billion MIDI note events and over 5.3 million MIDI tracks. GigaMIDI is currently the largest collection of symbolic music in MIDI format available for research purposes under fair dealing. Distinguishing between non-expressive and expressive MIDI tracks is challenging, as MIDI files do not inherently make this distinction. To address this issue, we introduce a set of innovative heuristics for detecting expressive music performance. These include the Distinctive Note Velocity Ratio (DNVR) heuristic, which analyzes MIDI note velocity; the Distinctive Note Onset Deviation Ratio (DNODR) heuristic, which examines deviations in note onset times; and the Note Onset Median Metric Level (NOMML) heuristic, which evaluates onset positions relative to metric levels. Our evaluation demonstrates these heuristics effectively differentiate between non-expressive and expressive MIDI tracks. Furthermore, after evaluation, we create the most substantial expressive MIDI dataset, employing our heuristic, NOMML. This curated iteration of GigaMIDI encompasses expressively-performed instrument tracks detected by NOMML, containing all General MIDI instruments, constituting 31% of the GigaMIDI dataset, totalling 1,655,649 tracks.

The GigaMIDI Dataset with Features for Expressive Music Performance Detection

TL;DR

This work addresses the challenge of distinguishing expressive versus non-expressive MIDI performances at track level by leveraging a massive symbolic music corpus, GigaMIDI, and a set of novel heuristics. The authors introduce three expressiveness detectors—DNVR, DNODR, and NOMML—with NOMML showing perfect separation on ground-truth data and enabling the creation of a large expressiveness subset (1,655,649 tracks). They provide extensive dataset statistics, standardized preprocessing, and a public HuggingFace release to support MIR and symbolic music research. The work highlights practical impacts for symbolic music generation, data mining, and digital musicology, while acknowledging biases and limitations in ground-truth coverage and instrument representation. Overall, NOMML emerges as a robust, scalable metric for expressive performance detection across GM instruments, facilitating future studies in expressive generation and analysis within large symbolic corpora.

Abstract

The Musical Instrument Digital Interface (MIDI), introduced in 1983, revolutionized music production by allowing computers and instruments to communicate efficiently. MIDI files encode musical instructions compactly, facilitating convenient music sharing. They benefit Music Information Retrieval (MIR), aiding in research on music understanding, computational musicology, and generative music. The GigaMIDI dataset contains over 1.4 million unique MIDI files, encompassing 1.8 billion MIDI note events and over 5.3 million MIDI tracks. GigaMIDI is currently the largest collection of symbolic music in MIDI format available for research purposes under fair dealing. Distinguishing between non-expressive and expressive MIDI tracks is challenging, as MIDI files do not inherently make this distinction. To address this issue, we introduce a set of innovative heuristics for detecting expressive music performance. These include the Distinctive Note Velocity Ratio (DNVR) heuristic, which analyzes MIDI note velocity; the Distinctive Note Onset Deviation Ratio (DNODR) heuristic, which examines deviations in note onset times; and the Note Onset Median Metric Level (NOMML) heuristic, which evaluates onset positions relative to metric levels. Our evaluation demonstrates these heuristics effectively differentiate between non-expressive and expressive MIDI tracks. Furthermore, after evaluation, we create the most substantial expressive MIDI dataset, employing our heuristic, NOMML. This curated iteration of GigaMIDI encompasses expressively-performed instrument tracks detected by NOMML, containing all General MIDI instruments, constituting 31% of the GigaMIDI dataset, totalling 1,655,649 tracks.

Paper Structure

This paper contains 33 sections, 1 equation, 13 figures, 7 tables, 2 algorithms.

Figures (13)

  • Figure 1: Four classes (NE= non-expressive, EO= expressive-onset, EV= expressive-velocity, and EP= expressively-performed) using heuristics in Section \ref{['sec:MIDI-expressive performance1']} for the expressive performance detection of MIDI tracks in GigaMIDI.
  • Figure 2: Distribution of the duration in bars of the files from each subset of the GigaMIDI dataset. The X-axis is clipped to 300 for better readability.
  • Figure 3: Distribution of files in GigaMIDI according to (a) MIDI notes, and (b) ticks per quarter note (TPQN)
  • Figure 4: Musicmap style topology musicmap.
  • Figure 5: Distribution of musical style in GigaMIDI.
  • ...and 8 more figures