Table of Contents
Fetching ...

Keep the beat going: Automatic drum transcription with momentum

Alisha L. Foster, Robert J. Webber

TL;DR

This paper addresses automatic drum transcription by formulating a partially fixed nonnegative matrix factorization model that fixes drum components while modeling remaining spectral content with a harmonic block. It compares two nonnegativity-preserving optimizers, a multiplicative update rule (MUR) and projected gradient descent with momentum (NeNMF), deriving theoretical guarantees and adapting NeNMF to PFNMF via inner-subproblem optimization. Empirically, NeNMF achieves higher transcription accuracy and stronger convergence guarantees than MUR on ENST-Drums and a track from the author’s band, while maintaining comparable runtimes under a fixed budget. The work advances interpretable ADT by combining PFNMF with robust optimization, highlighting NeNMF as the preferred approach for future PFNMF-based drum transcription and related applications.

Abstract

How can we process a piece of recorded music to detect and visualize the onset of each instrument? A simple, interpretable approach is based on partially fixed nonnegative matrix factorization (NMF). Yet despite the method's simplicity, partially fixed NMF is challenging to apply because the associated optimization problem is high-dimensional and non-convex. This paper explores two optimization approaches that preserve the nonnegative structure, including a multiplicative update rule and projected gradient descent with momentum. These techniques are derived from the previous literature, but they have not been fully developed for partially fixed NMF before now. Results indicate that projected gradient descent with momentum leads to the higher accuracy among the two methods, and it satisfies stronger local convergence guarantees.

Keep the beat going: Automatic drum transcription with momentum

TL;DR

This paper addresses automatic drum transcription by formulating a partially fixed nonnegative matrix factorization model that fixes drum components while modeling remaining spectral content with a harmonic block. It compares two nonnegativity-preserving optimizers, a multiplicative update rule (MUR) and projected gradient descent with momentum (NeNMF), deriving theoretical guarantees and adapting NeNMF to PFNMF via inner-subproblem optimization. Empirically, NeNMF achieves higher transcription accuracy and stronger convergence guarantees than MUR on ENST-Drums and a track from the author’s band, while maintaining comparable runtimes under a fixed budget. The work advances interpretable ADT by combining PFNMF with robust optimization, highlighting NeNMF as the preferred approach for future PFNMF-based drum transcription and related applications.

Abstract

How can we process a piece of recorded music to detect and visualize the onset of each instrument? A simple, interpretable approach is based on partially fixed nonnegative matrix factorization (NMF). Yet despite the method's simplicity, partially fixed NMF is challenging to apply because the associated optimization problem is high-dimensional and non-convex. This paper explores two optimization approaches that preserve the nonnegative structure, including a multiplicative update rule and projected gradient descent with momentum. These techniques are derived from the previous literature, but they have not been fully developed for partially fixed NMF before now. Results indicate that projected gradient descent with momentum leads to the higher accuracy among the two methods, and it satisfies stronger local convergence guarantees.

Paper Structure

This paper contains 18 sections, 3 theorems, 24 equations, 5 figures, 3 tables, 3 algorithms.

Key Result

Theorem 3.1

The square Frobenius norm $\lVert \bm{V} -\bm{WH} \rVert_{\rm F}^2$ is nonincreasing under the multiplicative update rule

Figures (5)

  • Figure 1: Automatic drum transcription (ADT) system applied to the first 30 seconds of the song "Every Now and Then" to visualize the onsets of snare and bass drums.
  • Figure 2: Plot of the magnitudes of the frequencies for the basis components in $\bm{W}_D$ for the snare and bass drums for "Every Now and Then."
  • Figure 3: Photograph of the author's drum kit with instruments annotated by hand. The snare and bass drums were included in the automatic drum transcription for the song "Every Now and Then", as described in \ref{['sec:drum']}.
  • Figure 4: Square Frobenius norm error as a function of runtime for MUR and NeNMF. The horizontal axis shows the number of iterations for MUR and the number of inner iterations ($10\times$ the number of outer iterations) for NeNMF. Note that the number of inner OGM iterations per outer iteration for NeNMF is fixed at 10. The error for ENST-Drums (left) is averaged across 28 tracks, while the error for "Every Now and Then" (right) is for a single track.
  • Figure 5: Entry values of drum activation matrix $\bm{H}_D$ plotted along time, optimized with NeNMF (top) and MUR (bottom) for the first 30 seconds of "Every Now and Then." The markers represent ground-truth annotations, and large entry values of $\bm{H}_D$ represent detected onsets.

Theorems & Definitions (4)

  • Theorem 3.1: Convergence of original MUR algorithm lee2000algorithm
  • Theorem 3.2: Convergence of MUR algorithm for partially fixed NMF
  • Theorem 3.3: Convergence of OGM GuanNeNMF
  • proof