Table of Contents
Fetching ...

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

Nicolas Boulanger-Lewandowski, Yoshua Bengio, Pascal Vincent

TL;DR

A probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences that outperforms many traditional models of polyphonic music on a variety of realistic datasets is introduced.

Abstract

We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. Our approach outperforms many traditional models of polyphonic music on a variety of realistic datasets. We show how our musical language model can serve as a symbolic prior to improve the accuracy of polyphonic transcription.

Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription

TL;DR

A probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences that outperforms many traditional models of polyphonic music on a variety of realistic datasets is introduced.

Abstract

We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. Our approach outperforms many traditional models of polyphonic music on a variety of realistic datasets. We show how our musical language model can serve as a symbolic prior to improve the accuracy of polyphonic transcription.

Paper Structure

This paper contains 10 sections, 16 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Mean-field samples of an RBM trained on the Piano-midi (top) and JSB chorales (bottom) datasets. Each column is a sample vector of notes, with a chord label where the analysis is unambiguous.
  • Figure 2: Comparison of the graphical structures of (a) the RTRBM and (b) the single-layer RNN-RBM. Single arrows represent a deterministic function, double arrows represent the stochastic hidden-visible connections of an RBM. The upper half of the RNN-RBM is the RBM stage while the lower half is a RNN with hidden units$\hat{h}^{(t)}$. The RBM biases $b_{h}^{(t)}, b_{v}^{(t)}$ are a linear function of $\hat{h}^{(t-1)}$.
  • Figure 3: Receptive fields of 48 hidden units of an RNNRBM trained on the bouncing balls dataset. Each square shows the input weights of a hidden unit as an image.
  • Figure 4: Effect of SGD and HF pretraining on the RNNRBM symbolic prediction performance. All strategies except the baseline involve pretraining.
  • Figure 5: Frame-level transcription accuracy of the Nam et al. (2011) model either alone, after HMM smoothing or with our best performing model as a symbolic prior.