Beat this! Accurate beat tracking without DBN postprocessing
Francesco Foscarin, Jan Schlüter, Gerhard Widmer
TL;DR
This work addresses beat and downbeat tracking with a focus on broad generality across diverse music without relying on Dynamic Bayesian Network postprocessing. It introduces a ~20 M parameter model with a frontend, partial-frequency/time transformers, rotary positional embeddings, and a shift-tolerant loss, achieving state-of-the-art F1 without a DBN on 18 datasets. Ablations show the shift-tolerant loss, partial transformers, and data augmentation are key to performance, though continuity metrics suffer in complex pieces, suggesting a trade-off between local accuracy and global periodicity. The authors provide open-source code, pretrained models, and datasets to invite community improvement and outline future directions, including model compression, improved loss functions enforcing periodicity, and dataset quality enhancements.
Abstract
We propose a system for tracking beats and downbeats with two objectives: generality across a diverse music range, and high accuracy. We achieve generality by training on multiple datasets -- including solo instrument recordings, pieces with time signature changes, and classical music with high tempo variations -- and by removing the commonly used Dynamic Bayesian Network (DBN) postprocessing, which introduces constraints on the meter and tempo. For high accuracy, among other improvements, we develop a loss function tolerant to small time shifts of annotations, and an architecture alternating convolutions with transformers either over frequency or time. Our system surpasses the current state of the art in F1 score despite using no DBN. However, it can still fail, especially for difficult and underrepresented genres, and performs worse on continuity metrics, so we publish our model, code, and preprocessed datasets, and invite others to beat this.
