Table of Contents
Fetching ...

Perception-Inspired Graph Convolution for Music Understanding Tasks

Emmanouil Karystinaios, Francesco Foscarin, Gerhard Widmer

TL;DR

This work introduces MusGConv, a perception-inspired graph convolution block tailored to symbolic music, designed to capture pitch and rhythm through relative and absolute representations. By constructing edge features from note onset, duration, pitch distances, and pitch-class intervals, and by a novel edge-based message passing scheme, MusGConv enables efficient, transposition- and tempo-aware processing of musical scores. Across four diverse tasks—voice separation, composer classification, Roman numeral analysis, and cadence detection—the approach yields improvements on three tasks with minimal computational overhead, demonstrating the benefit of perception-informed processing for graph-based music understanding. The results suggest that incorporating pairwise note relations and careful edge-feature design can enhance musical GNNs without added complexity, with potential for broader impact in MIR applications.

Abstract

We propose a new graph convolutional block, called MusGConv, specifically designed for the efficient processing of musical score data and motivated by general perceptual principles. It focuses on two fundamental dimensions of music, pitch and rhythm, and considers both relative and absolute representations of these components. We evaluate our approach on four different musical understanding problems: monophonic voice separation, harmonic analysis, cadence detection, and composer identification which, in abstract terms, translate to different graph learning problems, namely, node classification, link prediction, and graph classification. Our experiments demonstrate that MusGConv improves the performance on three of the aforementioned tasks while being conceptually very simple and efficient. We interpret this as evidence that it is beneficial to include perception-informed processing of fundamental musical concepts when developing graph network applications on musical score data.

Perception-Inspired Graph Convolution for Music Understanding Tasks

TL;DR

This work introduces MusGConv, a perception-inspired graph convolution block tailored to symbolic music, designed to capture pitch and rhythm through relative and absolute representations. By constructing edge features from note onset, duration, pitch distances, and pitch-class intervals, and by a novel edge-based message passing scheme, MusGConv enables efficient, transposition- and tempo-aware processing of musical scores. Across four diverse tasks—voice separation, composer classification, Roman numeral analysis, and cadence detection—the approach yields improvements on three tasks with minimal computational overhead, demonstrating the benefit of perception-informed processing for graph-based music understanding. The results suggest that incorporating pairwise note relations and careful edge-feature design can enhance musical GNNs without added complexity, with potential for broader impact in MIR applications.

Abstract

We propose a new graph convolutional block, called MusGConv, specifically designed for the efficient processing of musical score data and motivated by general perceptual principles. It focuses on two fundamental dimensions of music, pitch and rhythm, and considers both relative and absolute representations of these components. We evaluate our approach on four different musical understanding problems: monophonic voice separation, harmonic analysis, cadence detection, and composer identification which, in abstract terms, translate to different graph learning problems, namely, node classification, link prediction, and graph classification. Our experiments demonstrate that MusGConv improves the performance on three of the aforementioned tasks while being conceptually very simple and efficient. We interpret this as evidence that it is beneficial to include perception-informed processing of fundamental musical concepts when developing graph network applications on musical score data.
Paper Structure (28 sections, 9 equations, 5 figures, 1 table)

This paper contains 28 sections, 9 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Three alternative representations of note pitches in a musical excerpt: (a) absolute representation in terms of MIDI pitch; (b) relative pitch distance (ignoring the octave) in semitones relative to the fundamental pitch specified by the key signature (here: C); (c) relative pitch distance in semitones from the closest preceding note; in case of chords the order is defined from bottom to top.
  • Figure 2: General architecture of our pipeline. The first part that produces the hidden node representation is common among all tasks; the last module is task-specific.
  • Figure 3: Visualization of update for node $u$ in our MusGConv block (considering only one edge type), corresponding to Eqns. \ref{['eq:our_mespas']} and \ref{['eq:our_eta']}.
  • Figure 4: Relative Pitch features $e^\textrm{pitch}_{vu}$ for the highlighted note $u$.
  • Figure 5: Ablation studies.