Table of Contents
Fetching ...

AutoSchA: Automatic Hierarchical Music Representations via Multi-Relational Node Isolation

Stephen Ni-Hahn, Rico Zhu, Jerry Yin, Yue Jiang, Cynthia Rudin, Simon Mak

TL;DR

This work reframes Schenkerian hierarchical analysis as a graph pooling problem on symbolic music and introduces AutoSchA, a multi-relational GNN with a novel node-isolation pooling mechanism. The approach combines directed multi-relational convolution, adaptive pooling losses, and two global feature strategies (sequential and subspace merging) to infer depth-wise hierarchical structures and voice assignments. Empirical results show AutoSchA achieving performance near human experts on Baroque fugue subjects, with ablations highlighting the importance of rhythmic over pitch features. The framework opens avenues for AI-assisted music theory, generation, and broader hierarchical music analysis using graph-based representations.

Abstract

Hierarchical representations provide powerful and principled approaches for analyzing many musical genres. Such representations have been broadly studied in music theory, for instance via Schenkerian analysis (SchA). Hierarchical music analyses, however, are highly cost-intensive; the analysis of a single piece of music requires a great deal of time and effort from trained experts. The representation of hierarchical analyses in a computer-readable format is a further challenge. Given recent developments in hierarchical deep learning and increasing quantities of computer-readable data, there is great promise in extending such work for an automatic hierarchical representation framework. This paper thus introduces a novel approach, AutoSchA, which extends recent developments in graph neural networks (GNNs) for hierarchical music analysis. AutoSchA features three key contributions: 1) a new graph learning framework for hierarchical music representation, 2) a new graph pooling mechanism based on node isolation that directly optimizes learned pooling assignments, and 3) a state-of-the-art architecture that integrates such developments for automatic hierarchical music analysis. We show, in a suite of experiments, that AutoSchA performs comparably to human experts when analyzing Baroque fugue subjects.

AutoSchA: Automatic Hierarchical Music Representations via Multi-Relational Node Isolation

TL;DR

This work reframes Schenkerian hierarchical analysis as a graph pooling problem on symbolic music and introduces AutoSchA, a multi-relational GNN with a novel node-isolation pooling mechanism. The approach combines directed multi-relational convolution, adaptive pooling losses, and two global feature strategies (sequential and subspace merging) to infer depth-wise hierarchical structures and voice assignments. Empirical results show AutoSchA achieving performance near human experts on Baroque fugue subjects, with ablations highlighting the importance of rhythmic over pitch features. The framework opens avenues for AI-assisted music theory, generation, and broader hierarchical music analysis using graph-based representations.

Abstract

Hierarchical representations provide powerful and principled approaches for analyzing many musical genres. Such representations have been broadly studied in music theory, for instance via Schenkerian analysis (SchA). Hierarchical music analyses, however, are highly cost-intensive; the analysis of a single piece of music requires a great deal of time and effort from trained experts. The representation of hierarchical analyses in a computer-readable format is a further challenge. Given recent developments in hierarchical deep learning and increasing quantities of computer-readable data, there is great promise in extending such work for an automatic hierarchical representation framework. This paper thus introduces a novel approach, AutoSchA, which extends recent developments in graph neural networks (GNNs) for hierarchical music analysis. AutoSchA features three key contributions: 1) a new graph learning framework for hierarchical music representation, 2) a new graph pooling mechanism based on node isolation that directly optimizes learned pooling assignments, and 3) a state-of-the-art architecture that integrates such developments for automatic hierarchical music analysis. We show, in a suite of experiments, that AutoSchA performs comparably to human experts when analyzing Baroque fugue subjects.

Paper Structure

This paper contains 25 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Top: SchA, Pachelbel's Primi Toni No. 1, fugue subject. Bottom: Mathematical representation of SchA as several bit arrays denoting whether notes belong in a certain depth. Note that higher depths are included in lower depths (the first note is included in all depths from 0-4).
  • Figure 2: Overview of model architecture: diagram for the pooling GNN. Blue arrows indicate the major flow of GNN convolution, green arrows represents global feature extraction, yellow indicates threshold masking, red describes information regarding the scoring model and loss functions, and purple arrows show the flow of backpropogation. Given input graph $\mathcal{G} = (\mathbf{X}, \mathbf{A}_1, ..., \mathbf{A}_m)$, we first perform a directed multi-relational convolution (parameterized by $\theta_{conv}$) over input features $\mathbf{X}$ to generate node embeddings $\mathbf{Z}^{(l)}$, and global feature matrix $\mathbf{X}_{\text{global}}$ (see Figure \ref{['fig:global-features']} for details). $\mathbf{Z}^{(l)}$ and $\mathbf{X}_{\text{global}}$ are concatenated and passed to the scoring model (parameterized by $\theta_{score}$), generating pooling scores $\mathbf{\hat{y}}^{(l)}$. We aim to minimize the cross-entropy $\mathcal{L}_p$ between the pooling scores and ground truth assignments. Based on $\mathbf{\hat{y}}^{(l)}$ and threshold $c_{min}$, the nodes are masked or "isolated" for the next GNN layer $l+1$, to which $\mathbf{Z}^{(l)}$ is passed in place of $\mathbf{X}$. To ensure monotonicity in pooling scores, we add a regularizer term $\mathcal{L}_m$ computed from scores at layer $l$ and layer $l+1$.
  • Figure 3: Primi Toni No. 1 as a multi-relational graph.
  • Figure 4: Two approaches to compute the global embedding, $\mathbf{X}_{\text{global}}$. (1) A sequential approach, leveraging a transformer to encode long-range dependencies. We obtain a canonical sequence representation of our graph via the topological order of the forwards edges. (2) A subspace merging approach, based on computing a unified global topology fusing all edge types. We compute a fused global graph $\mathbf{A}_{\text{mod}}$ given fixed embeddings $\mathbf{U}_1, ..., \mathbf{U}_m$ from the Laplacians $\mathbf{L}_1, ..., \mathbf{L}_m$, and then convolve the original features $\mathbf{X}$ over $\mathbf{A}_{\text{mod}}$. Further details are in the technical appendix.