Table of Contents
Fetching ...

A New Dataset, Notation Software, and Representation for Computational Schenkerian Analysis

Stephen Ni-Hahn, Weihan Xu, Jerry Yin, Rico Zhu, Simon Mak, Yue Jiang, Cynthia Rudin

TL;DR

This work tackles the shortage of large, high-quality, machine-readable Schenkerian data by introducing a growing dataset of SchA analyses, a data-collection/notation tool, and a heterogeneous graph representation for SchA. The dataset contains over 140 excerpts (145 analyses across multiple analysts), spanning diverse composers, and is designed to grow over time. The notation tool provides an accessible JSON-based encoding with cross-notation interoperability, while the graph formulation enables flexible modeling of multivoice SchA and clustering-based analysis. Collectively, these contributions enable data-driven exploration of SchA for music information retrieval and generation tasks, and they pave the way for learning complex hierarchical musical structure.

Abstract

Schenkerian Analysis (SchA) is a uniquely expressive method of music analysis, combining elements of melody, harmony, counterpoint, and form to describe the hierarchical structure supporting a work of music. However, despite its powerful analytical utility and potential to improve music understanding and generation, SchA has rarely been utilized by the computer music community. This is in large part due to the paucity of available high-quality data in a computer-readable format. With a larger corpus of Schenkerian data, it may be possible to infuse machine learning models with a deeper understanding of musical structure, thus leading to more "human" results. To encourage further research in Schenkerian analysis and its potential benefits for music informatics and generation, this paper presents three main contributions: 1) a new and growing dataset of SchAs, the largest in human- and computer-readable formats to date (>140 excerpts), 2) a novel software for visualization and collection of SchA data, and 3) a novel, flexible representation of SchA as a heterogeneous-edge graph data structure.

A New Dataset, Notation Software, and Representation for Computational Schenkerian Analysis

TL;DR

This work tackles the shortage of large, high-quality, machine-readable Schenkerian data by introducing a growing dataset of SchA analyses, a data-collection/notation tool, and a heterogeneous graph representation for SchA. The dataset contains over 140 excerpts (145 analyses across multiple analysts), spanning diverse composers, and is designed to grow over time. The notation tool provides an accessible JSON-based encoding with cross-notation interoperability, while the graph formulation enables flexible modeling of multivoice SchA and clustering-based analysis. Collectively, these contributions enable data-driven exploration of SchA for music information retrieval and generation tasks, and they pave the way for learning complex hierarchical musical structure.

Abstract

Schenkerian Analysis (SchA) is a uniquely expressive method of music analysis, combining elements of melody, harmony, counterpoint, and form to describe the hierarchical structure supporting a work of music. However, despite its powerful analytical utility and potential to improve music understanding and generation, SchA has rarely been utilized by the computer music community. This is in large part due to the paucity of available high-quality data in a computer-readable format. With a larger corpus of Schenkerian data, it may be possible to infuse machine learning models with a deeper understanding of musical structure, thus leading to more "human" results. To encourage further research in Schenkerian analysis and its potential benefits for music informatics and generation, this paper presents three main contributions: 1) a new and growing dataset of SchAs, the largest in human- and computer-readable formats to date (>140 excerpts), 2) a novel software for visualization and collection of SchA data, and 3) a novel, flexible representation of SchA as a heterogeneous-edge graph data structure.
Paper Structure (12 sections, 1 equation, 6 figures)

This paper contains 12 sections, 1 equation, 6 figures.

Figures (6)

  • Figure 1: The primary author's analysis of J.S. Bach's F major fugue subject from Das Wohltemperierte Klavier I.
  • Figure 2: Screenshots of a toy Schenkerian analysis in JSON and graphical form as generated by our notation software.
  • Figure 3: Dataset statistics. Verticality is defined as a point in time where one or both of a treble and bass note exist. "Inclusive" includes notes of higher depth when counting notes of lower depths. "Literal" counts the note depths as they are defined. The final column describes the distribution of max depths over all excerpts. See Section \ref{['sec:data']} for more details.
  • Figure 4: Distribution of intervals between consecutive notes at each depth.
  • Figure 5: Visualization of Schenkerian analysis as a series of clustering matrices. The bottom row shows a simple score with Schenkerian annotation moving from all notes in the score to more abstracted versions of the score from left to right. The middle row visualizes the music as a graph. The top row shows the ground truth cluster matrices that relate one layer to the next; rows describe nodes before clustering, while columns describe nodes after clustering.
  • ...and 1 more figures