Keep the beat going: Automatic drum transcription with momentum
Alisha L. Foster, Robert J. Webber
TL;DR
This paper addresses automatic drum transcription by formulating a partially fixed nonnegative matrix factorization model that fixes drum components while modeling remaining spectral content with a harmonic block. It compares two nonnegativity-preserving optimizers, a multiplicative update rule (MUR) and projected gradient descent with momentum (NeNMF), deriving theoretical guarantees and adapting NeNMF to PFNMF via inner-subproblem optimization. Empirically, NeNMF achieves higher transcription accuracy and stronger convergence guarantees than MUR on ENST-Drums and a track from the author’s band, while maintaining comparable runtimes under a fixed budget. The work advances interpretable ADT by combining PFNMF with robust optimization, highlighting NeNMF as the preferred approach for future PFNMF-based drum transcription and related applications.
Abstract
How can we process a piece of recorded music to detect and visualize the onset of each instrument? A simple, interpretable approach is based on partially fixed nonnegative matrix factorization (NMF). Yet despite the method's simplicity, partially fixed NMF is challenging to apply because the associated optimization problem is high-dimensional and non-convex. This paper explores two optimization approaches that preserve the nonnegative structure, including a multiplicative update rule and projected gradient descent with momentum. These techniques are derived from the previous literature, but they have not been fully developed for partially fixed NMF before now. Results indicate that projected gradient descent with momentum leads to the higher accuracy among the two methods, and it satisfies stronger local convergence guarantees.
