Perception-Inspired Graph Convolution for Music Understanding Tasks
Emmanouil Karystinaios, Francesco Foscarin, Gerhard Widmer
TL;DR
This work introduces MusGConv, a perception-inspired graph convolution block tailored to symbolic music, designed to capture pitch and rhythm through relative and absolute representations. By constructing edge features from note onset, duration, pitch distances, and pitch-class intervals, and by a novel edge-based message passing scheme, MusGConv enables efficient, transposition- and tempo-aware processing of musical scores. Across four diverse tasks—voice separation, composer classification, Roman numeral analysis, and cadence detection—the approach yields improvements on three tasks with minimal computational overhead, demonstrating the benefit of perception-informed processing for graph-based music understanding. The results suggest that incorporating pairwise note relations and careful edge-feature design can enhance musical GNNs without added complexity, with potential for broader impact in MIR applications.
Abstract
We propose a new graph convolutional block, called MusGConv, specifically designed for the efficient processing of musical score data and motivated by general perceptual principles. It focuses on two fundamental dimensions of music, pitch and rhythm, and considers both relative and absolute representations of these components. We evaluate our approach on four different musical understanding problems: monophonic voice separation, harmonic analysis, cadence detection, and composer identification which, in abstract terms, translate to different graph learning problems, namely, node classification, link prediction, and graph classification. Our experiments demonstrate that MusGConv improves the performance on three of the aforementioned tasks while being conceptually very simple and efficient. We interpret this as evidence that it is beneficial to include perception-informed processing of fundamental musical concepts when developing graph network applications on musical score data.
