Table of Contents
Fetching ...

Counterpoint by Convolution

Cheng-Zhi Anna Huang, Tim Cooijmans, Adam Roberts, Aaron Courville, Douglas Eck

TL;DR

The paper presents Coconet, a convolutional model that performs partial-score reconstruction via an orderless NADE framework for polyphonic Bach chorales. It demonstrates that blocked Gibbs sampling, including independent blocked Gibbs with annealed masking, substantially improves sample quality over naive ancestral sampling. Through framewise log-likelihood evaluation, sampling experiments, and human assessments, the authors show that randomized orderings and rewrite-style updates yield more Bach-like, coherent music. The approach enables flexible partial-score completion and rewriting tasks, offering a practical tool for composers and music-information research. The work highlights the advantages of combining orderless probabilistic modeling with efficient Gibbs-based inference in structured, high-dimensional domains like polyphonic music.

Abstract

Machine learning models of music typically break up the task of composition into a chronological process, composing a piece of music in a single pass from beginning to end. On the contrary, human composers write music in a nonlinear fashion, scribbling motifs here and there, often revisiting choices previously made. In order to better approximate this process, we train a convolutional neural network to complete partial musical scores, and explore the use of blocked Gibbs sampling as an analogue to rewriting. Neither the model nor the generative procedure are tied to a particular causal direction of composition. Our model is an instance of orderless NADE (Uria et al., 2014), which allows more direct ancestral sampling. However, we find that Gibbs sampling greatly improves sample quality, which we demonstrate to be due to some conditional distributions being poorly modeled. Moreover, we show that even the cheap approximate blocked Gibbs procedure from Yao et al. (2014) yields better samples than ancestral sampling, based on both log-likelihood and human evaluation.

Counterpoint by Convolution

TL;DR

The paper presents Coconet, a convolutional model that performs partial-score reconstruction via an orderless NADE framework for polyphonic Bach chorales. It demonstrates that blocked Gibbs sampling, including independent blocked Gibbs with annealed masking, substantially improves sample quality over naive ancestral sampling. Through framewise log-likelihood evaluation, sampling experiments, and human assessments, the authors show that randomized orderings and rewrite-style updates yield more Bach-like, coherent music. The approach enables flexible partial-score completion and rewriting tasks, offering a practical tool for composers and music-information research. The work highlights the advantages of combining orderless probabilistic modeling with efficient Gibbs-based inference in structured, high-dimensional domains like polyphonic music.

Abstract

Machine learning models of music typically break up the task of composition into a chronological process, composing a piece of music in a single pass from beginning to end. On the contrary, human composers write music in a nonlinear fashion, scribbling motifs here and there, often revisiting choices previously made. In order to better approximate this process, we train a convolutional neural network to complete partial musical scores, and explore the use of blocked Gibbs sampling as an analogue to rewriting. Neither the model nor the generative procedure are tied to a particular causal direction of composition. Our model is an instance of orderless NADE (Uria et al., 2014), which allows more direct ancestral sampling. However, we find that Gibbs sampling greatly improves sample quality, which we demonstrate to be due to some conditional distributions being poorly modeled. Moreover, we show that even the cheap approximate blocked Gibbs procedure from Yao et al. (2014) yields better samples than ancestral sampling, based on both log-likelihood and human evaluation.

Paper Structure

This paper contains 12 sections, 8 equations, 4 figures, 2 tables, 1 algorithm.

Figures (4)

  • Figure 1: Blocked Gibbs inpainting of a corrupted Bach chorale by Coconet. At each step, a random subset of notes is removed, and the model is asked to infer their values. New values are sampled from the probability distribution put out by the model, and the process is repeated. Left: annealed masks show resampled variables. Colors distinguish the four voices. Middle: grayscale heatmaps show predictions $p(\mathbf{x}_j ~|~ \mathbf{x}_C)$ summed across instruments. Right: complete pianorolls after resampling the masked variables. Bottom: a sample from NADE (left) and the original Bach chorale fragment (right).
  • Figure 2: Likelihood under the model for ancestral Gibbs samples obtained with various context distributions $p(C)$. Nade ($\mathrm{Bernoulli(0.00)}$) is included for reference.
  • Figure 3: Human evaluations from MTurk on how many times a sampling procedure or Bach is perceived as more Bach-like. Error bars show the standard deviation of a binomial distribution fitted to each's binary win/loss counts.
  • Figure :