Table of Contents
Fetching ...

Learning and composing of classical music using restricted Boltzmann machines

Mutsumi Kobayashi, Hiroshi Watanabe

TL;DR

The paper investigates how a simple, transparent RBM can learn to compose music from piano-roll representations, aiming to shed light on internal representations rather than optimizing performance. By training on Bach piano-rolls and evaluating reconstruction, energy, and generation capabilities, the study shows the model can produce musically structured pieces but encodes information in a way not directly aligned with conventional music theory. Through targeted analyses (including t-SNE of hidden units and transposition tests), the work highlights how absolute pitch information can dominate learned representations and discusses interpretability limits of generative models. The results contribute to understanding the trade-off between simplicity, interpretability, and creative capability in AI-driven music generation, and propose avenues for more translational invariance and broader corpora.

Abstract

We investigate how machine learning models acquire the ability to compose music and how musical information is internally represented within such models. We develop a composition algorithm based on a restricted Boltzmann machine (RBM), a simple generative model capable of producing musical pieces of arbitrary length. We convert musical scores into piano-roll image representations and train the RBM in an unsupervised manner. We confirm that the trained RBM can generate new musical pieces; however, by analyzing the model's responses and internal structure, we find that the learned information is not stored in a form directly interpretable by humans. This study contributes to a better understanding of how machine learning models capable of music composition may internally represent musical structure and highlights issues related to the interpretability of generative models in creative tasks.

Learning and composing of classical music using restricted Boltzmann machines

TL;DR

The paper investigates how a simple, transparent RBM can learn to compose music from piano-roll representations, aiming to shed light on internal representations rather than optimizing performance. By training on Bach piano-rolls and evaluating reconstruction, energy, and generation capabilities, the study shows the model can produce musically structured pieces but encodes information in a way not directly aligned with conventional music theory. Through targeted analyses (including t-SNE of hidden units and transposition tests), the work highlights how absolute pitch information can dominate learned representations and discusses interpretability limits of generative models. The results contribute to understanding the trade-off between simplicity, interpretability, and creative capability in AI-driven music generation, and propose avenues for more translational invariance and broader corpora.

Abstract

We investigate how machine learning models acquire the ability to compose music and how musical information is internally represented within such models. We develop a composition algorithm based on a restricted Boltzmann machine (RBM), a simple generative model capable of producing musical pieces of arbitrary length. We convert musical scores into piano-roll image representations and train the RBM in an unsupervised manner. We confirm that the trained RBM can generate new musical pieces; however, by analyzing the model's responses and internal structure, we find that the learned information is not stored in a form directly interpretable by humans. This study contributes to a better understanding of how machine learning models capable of music composition may internally represent musical structure and highlights issues related to the interpretability of generative models in creative tasks.

Paper Structure

This paper contains 13 sections, 4 equations, 12 figures, 2 tables, 2 algorithms.

Figures (12)

  • Figure 1: (Color online) Piano roll representation of a musical segment. The horizontal axis represents time (note duration), with a total width of 192 pixels corresponding to two measures in 4/4 time. The vertical axis represents pitch, spanning 72 pixels from C1 (the lowest C on a standard 88-key piano) to B6. Each horizontal bar indicates a note, with its vertical position corresponding to pitch and its horizontal length indicating duration. A quarter note is represented by 24 pixels in width.
  • Figure 2: (Color online) Schematic illsutration of music composition procedure using the trained RBM.
  • Figure 3: (Color online) Schematic illustration of the procedure for composing a continuation from an already generated piano roll.
  • Figure 4: Reconstruction of piano roll images by the trained RBM. (a) Piano roll image of a J. S. Bach composition used for training. (a') Image reconstructed from (a) by the trained RBM. (b) Piano roll image of a W. A. Mozart composition not used during training. (b') Image reconstructed from (b) by the trained RBM.
  • Figure 5: Reconstruction of digit images by the trained RBM. (a), (b), (c): Input images from the MNIST dataset. (a'), (b'), (c'): Corresponding output images generated by the RBM. As evident from the outputs, the RBM fails to reconstruct the digit images and instead produces noise-like results, indicating that it has not generalized to image types outside the piano roll domain.
  • ...and 7 more figures