Table of Contents
Fetching ...

Musical Score Following using Statistical Inference

Josephine Cowley

TL;DR

The paper addresses real-time score following by introducing a two-stage framework that first uses a Gaussian Process with a Spectral Mixture kernel to infer likely notes from short audio frames, and then maps these in real time to score positions via a duration-sensitive Hidden Markov Model and a Windowed Viterbi algorithm. The approach demonstrates successful tracking across solo piano and other instruments, highlighting the viability of GP-based statistical inference in online Music Information Retrieval. Key contributions include the first proof-of-concept for applying Gaussian Processes to score following, a concrete two-stage architecture, and an open-source implementation with a real-time score renderer. The work advances MIR by showing how structured priors on harmonic content can be integrated into real-time inference, with potential impact on automatic page turning, accompaniment, and performance analytics.

Abstract

Musical score following is the real-time mapping of a performance to corresponding locations in a musical score. Score following can be used in a variety of applications including automatic page turning and real-time accompaniment. This report presents a novel approach for score following motivated by Wilson and Adams's 2013 paper, which introduces Spectral Mixture (SM) kernels for Gaussian Process (GP) regression. Since the SM kernel is derived from a Mixture of Gaussians in the frequency domain, it is particularly suitable for modelling the superposed power spectra of musical notes, in which energy is concentrated at multiples of the fundamental frequency of each note. Our score follower begins by using a GP to statistically infer the musical notes played during 800-sample 'audioframes' (~18 ms) of solo piano music. These predictions are then used in a duration-dependent Hidden Markov Model to predict the most likely score positions in real time. Our two-stage approach achieves successful score following not only on four-part hymns arranged for keyboard, but also on pieces for the violin, oboe, and flute. This showcases the powerful and flexible nature of GPs for statistical inference on musical audio signals. Given the success of this project, we contribute to the literature a first proof of concept of the application of GPs in score following, and more broadly, in online Music Information Retrieval (MIR) tasks. This project also contributes a working score follower product that renders score position in real time using an adapted open-source user interface. Areas for future work include improving accuracy on repeated notes and during heavy use of sustain pedal, adapting to minor deviations from the score, and modelling multi-instrument works.

Musical Score Following using Statistical Inference

TL;DR

The paper addresses real-time score following by introducing a two-stage framework that first uses a Gaussian Process with a Spectral Mixture kernel to infer likely notes from short audio frames, and then maps these in real time to score positions via a duration-sensitive Hidden Markov Model and a Windowed Viterbi algorithm. The approach demonstrates successful tracking across solo piano and other instruments, highlighting the viability of GP-based statistical inference in online Music Information Retrieval. Key contributions include the first proof-of-concept for applying Gaussian Processes to score following, a concrete two-stage architecture, and an open-source implementation with a real-time score renderer. The work advances MIR by showing how structured priors on harmonic content can be integrated into real-time inference, with potential impact on automatic page turning, accompaniment, and performance analytics.

Abstract

Musical score following is the real-time mapping of a performance to corresponding locations in a musical score. Score following can be used in a variety of applications including automatic page turning and real-time accompaniment. This report presents a novel approach for score following motivated by Wilson and Adams's 2013 paper, which introduces Spectral Mixture (SM) kernels for Gaussian Process (GP) regression. Since the SM kernel is derived from a Mixture of Gaussians in the frequency domain, it is particularly suitable for modelling the superposed power spectra of musical notes, in which energy is concentrated at multiples of the fundamental frequency of each note. Our score follower begins by using a GP to statistically infer the musical notes played during 800-sample 'audioframes' (~18 ms) of solo piano music. These predictions are then used in a duration-dependent Hidden Markov Model to predict the most likely score positions in real time. Our two-stage approach achieves successful score following not only on four-part hymns arranged for keyboard, but also on pieces for the violin, oboe, and flute. This showcases the powerful and flexible nature of GPs for statistical inference on musical audio signals. Given the success of this project, we contribute to the literature a first proof of concept of the application of GPs in score following, and more broadly, in online Music Information Retrieval (MIR) tasks. This project also contributes a working score follower product that renders score position in real time using an adapted open-source user interface. Areas for future work include improving accuracy on repeated notes and during heavy use of sustain pedal, adapting to minor deviations from the score, and modelling multi-instrument works.

Paper Structure

This paper contains 110 sections, 17 equations, 25 figures, 1 algorithm.

Figures (25)

  • Figure 1: Illustration of score following: positions (blue) in the musical score are mapped to locations in a live-streamed recording (red) of O Haupt voll Blut und Wunden arranged by Bach.
  • Figure 2: Examples from godsill_2006_bayesian of different instruments' temporal envelopes (top) and spectrograms (bottom). Spectrograms represent the time-varying magnitude spectra—i.e. the modulus of the short-time Fourier transform. Audio data and images are from the RWCP Instrument samples database.
  • Figure 3: Here we present random samples drawn from three different GP models which have different underlying covariance functions. From left to right, we have Rational Quadratic, Exponentiated Quadratic and Periodic kernels roelants_2019_gaussian. Hyperparameters are defined above each graph.
  • Figure 4: Diagram depicting the high-level score follower framework, where numbers represent the steps being completed.
  • Figure 5: Two time-amplitude graphs from a recording of the note A4 ($f_0 = 440$ Hz) on the piano.
  • ...and 20 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2