Generating Music with Structure Using Self-Similarity as Attention

Sophia Hager; Kathleen Hablutzel; Katherine M. Kinnaird

Generating Music with Structure Using Self-Similarity as Attention

Sophia Hager, Kathleen Hablutzel, Katherine M. Kinnaird

TL;DR

The addition of the proposed attention mechanism significantly improves the network's ability to replicate specific structures, and it performs better on an unseen test set than a model without the attention mechanism.

Abstract

Despite the innovations in deep learning and generative AI, creating long term structure as well as the layers of repeated structure common in musical works remains an open challenge in music generation. We propose an attention layer that uses a novel approach applying user-supplied self-similarity matrices to previous time steps, and demonstrate it in our Similarity Incentivized Neural Generator (SING) system, a deep learning autonomous music generation system with two layers. The first is a vanilla Long Short Term Memory layer, and the second is the proposed attention layer. During generation, this attention mechanism imposes a suggested structure from a template piece on the generated music. We train SING on the MAESTRO dataset using a novel variable batching method, and compare its performance to the same model without the attention mechanism. The addition of our proposed attention mechanism significantly improves the network's ability to replicate specific structures, and it performs better on an unseen test set than a model without the attention mechanism.

Generating Music with Structure Using Self-Similarity as Attention

TL;DR

Abstract

Paper Structure (22 sections, 8 equations, 3 figures, 2 tables)

This paper contains 22 sections, 8 equations, 3 figures, 2 tables.

Introduction
Motivation and Background
Methods
Dataset
Data Pre-Processing
Variable-Length Batching
Self-Similarity Matrices (SSMs)
Network Structure
LSTM Layer
Sparsemax Activation
Attention Layer
Sampling
Loss Function
Training Process
Methods of Evaluation
...and 7 more sections

Figures (3)

Figure 1: An example of a self-similarity matrix after the preprocessing for batching has been applied. Yellow regions indicate higher similarity, while blue regions indicate lower similarity. There are high level structures, such as the region from about 170-400 having relative similarity to itself compared to surrounding regions, and lower-level structure, such as the more minor variations in similarity within that region.
Figure 2: An example of the SSMs from generation. From left to right, the original SSM; the SSM generated by SING; and the SSM generated by an LSTM. The SSM of SING is closer to the original SSM than the comparison model, which demonstrates little structure, if any.
Figure 3: From left to right, a synthetic self-similarity matrix, the piece as generated by SING, and the piece as generated by the comparison model. SING generates a piece that resembles the synthetic SSM, while the comparison model cannot.

Generating Music with Structure Using Self-Similarity as Attention

TL;DR

Abstract

Generating Music with Structure Using Self-Similarity as Attention

Authors

TL;DR

Abstract

Table of Contents

Figures (3)