Table of Contents
Fetching ...

Deconstructing Jazz Piano Style Using Machine Learning

Huw Cheston, Reuben Bance, Peter M. C. Harrison

TL;DR

This work investigates how machine learning can illuminate jazz piano style by identifying performers and revealing the musical features that distinguish their playing. It compares handcrafted feature pipelines with representation-learning CNNs and introduces a novel four-domain, multi-input architecture that separately encodes melody, harmony, rhythm, and dynamics before integration. The study demonstrates near-state-of-the-art performer identification accuracy, strong interpretability for handcrafted and multi-input models, and informative visualization via LIME and concept analyses tied to jazz theory. Collectively, the methods offer scalable, explainable insights into individual artists’ stylistic signatures with practical implications for pedagogy, analysis, and cross-genre comparisons.

Abstract

Artistic style has been studied for centuries, and recent advances in machine learning create new possibilities for understanding it computationally. However, ensuring that machine-learning models produce insights aligned with the interests of practitioners and critics remains a significant challenge. Here, we focus on musical style, which benefits from a rich theoretical and mathematical analysis tradition. We train a variety of supervised-learning models to identify 20 iconic jazz musicians across a carefully curated dataset of 84 hours of recordings, and interpret their decision-making processes. Our models include a novel multi-input architecture that enables four musical domains (melody, harmony, rhythm, and dynamics) to be analysed separately. These models enable us to address fundamental questions in music theory and also advance the state-of-the-art in music performer identification (94% accuracy across 20 classes). We release open-source implementations of our models and an accompanying web application for exploring musical styles.

Deconstructing Jazz Piano Style Using Machine Learning

TL;DR

This work investigates how machine learning can illuminate jazz piano style by identifying performers and revealing the musical features that distinguish their playing. It compares handcrafted feature pipelines with representation-learning CNNs and introduces a novel four-domain, multi-input architecture that separately encodes melody, harmony, rhythm, and dynamics before integration. The study demonstrates near-state-of-the-art performer identification accuracy, strong interpretability for handcrafted and multi-input models, and informative visualization via LIME and concept analyses tied to jazz theory. Collectively, the methods offer scalable, explainable insights into individual artists’ stylistic signatures with practical implications for pedagogy, analysis, and cross-genre comparisons.

Abstract

Artistic style has been studied for centuries, and recent advances in machine learning create new possibilities for understanding it computationally. However, ensuring that machine-learning models produce insights aligned with the interests of practitioners and critics remains a significant challenge. Here, we focus on musical style, which benefits from a rich theoretical and mathematical analysis tradition. We train a variety of supervised-learning models to identify 20 iconic jazz musicians across a carefully curated dataset of 84 hours of recordings, and interpret their decision-making processes. Our models include a novel multi-input architecture that enables four musical domains (melody, harmony, rhythm, and dynamics) to be analysed separately. These models enable us to address fundamental questions in music theory and also advance the state-of-the-art in music performer identification (94% accuracy across 20 classes). We release open-source implementations of our models and an accompanying web application for exploring musical styles.

Paper Structure

This paper contains 49 sections, 4 equations, 59 figures, 5 tables.

Figures (59)

  • Figure 1: Dataset. The upper row of panels show (a) the number of recordings by each pianist and (b) the total duration of their recordings, stratified by source database. Pianists are sorted in descending order by total duration, with the shaded area indicating when this is greater than 80 minutes. (c) shows the distribution of recording durations across tracks in both databases, with dashed lines representing the median recording duration. (d) shows the number of 30-second clips in each split of the dataset used to train the neural networks.
  • Figure 2: Feature counts. Each panel shows the counts of the twenty most frequently occurring (a) melody and (b) harmony features across all recordings and splits of the dataset. The $x$-axis values can be interpreted as the number of semitones from an initial starting note.
  • Figure 3: Relationships between databases. Each panel shows correlation coefficients obtained between the weights of every performer for models fitted using (a) melody and (b) harmony features. Performers are sorted in descending order of magnitude according to the correlations in panel (a). Asterisks show the significance of the observed coefficients ($^* \ p < .05, \ ^{**} \ p < .01, \ ^{***} \ p < .001$, with Bonferroni correction applied), calculated using permutation testing ($N = 1,000$ iterations).
  • Figure 4: Predictive melody features. Both panels show feature weights obtained for the top and bottom five melody $n$-grams associated with classifications of (a) Bill Evans and (b) Oscar Peterson. Error bars show $SD$, calculated by bootstrapping the dataset used to fit the model ($N = 1,000$ iterations). Corresponding musical notation for every $n$-gram is on the right, transposed such that the first note is C and the mean pitch height is approximately centred around $G_4$. The pitch spelling for each feature is estimated using the algorithm described by Meredith2006
  • Figure 5: PCA projections. The horizontal and vertical axes respectively show melody feature and performer projections onto (a) principal components 1 and 2 and (b) components 3 and 4. Values are linearly scaled separately for each dimension to within the range $(-1, 1)$. Each melody feature shows the number of semitones relative to the initial note $0$, as in Figure \ref{['fig:rsi_feature_counts']}. To select the melody features to plot, we divide the two-dimensional representation into eight circular "slices" and within each slice plot the 5 features with the highest absolute magnitudes. Performers are shown by their initials, such that BE corresponds to Bill Evans.
  • ...and 54 more figures