Improved symbolic drum style classification with grammar-based hierarchical representations
Léo Géré, Philippe Rigaux, Nicolas Audebert
TL;DR
This work tackles the problem of representing symbolic MIDI data for deep learning tasks, specifically drumming style classification, by moving beyond common tokenization and piano-roll encodings. It introduces a Linearized Rhythmic Tree (LRT) derived from a context-free musical grammar and augments it with a tree-based positional encoding (TBPE) to preserve hierarchical rhythm information in Transformer models. Empirical results on GrooveMIDI show that LRT with TBPE yields competitive or superior performance at roughly an order of magnitude fewer parameters than comparable LSTM baselines, and it demonstrates improved data efficiency relative to token-based or piano-roll representations. The findings suggest grammar-informed symbolic representations can enable more compact, rhythm-aware models with strong generalization for music style classification and potentially other symbolic-music tasks.
Abstract
Deep learning models have become a critical tool for analysis and classification of musical data. These models operate either on the audio signal, e.g. waveform or spectrogram, or on a symbolic representation, such as MIDI. In the latter, musical information is often reduced to basic features, i.e. durations, pitches and velocities. Most existing works then rely on generic tokenization strategies from classical natural language processing, or matrix representations, e.g. piano roll. In this work, we evaluate how enriched representations of symbolic data can impact deep models, i.e. Transformers and RNN, for music style classification. In particular, we examine representations that explicitly incorporate musical information implicitly present in MIDI-like encodings, such as rhythmic organization, and show that they outperform generic tokenization strategies. We introduce a new tree-based representation of MIDI data built upon a context-free musical grammar. We show that this grammar representation accurately encodes high-level rhythmic information and outperforms existing encodings on the GrooveMIDI Dataset for drumming style classification, while being more compact and parameter-efficient.
