Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory

Vedant Tapiavala; Joshua Piesner; Sourjyamoy Barman; Feng Fu

Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory

Vedant Tapiavala, Joshua Piesner, Sourjyamoy Barman, Feng Fu

TL;DR

This work frames jazz improvisation as a two-player, payoff-based game and uses reinforcement learning to study strategic interactions under a chord-driven blues structure. The payoff combines a variance (diversity) component $V$ and a harmony component $H$ into $P=rac{VM-H}{VM+H}$ with a balancing factor $M\,\approx\,1208.7571$, enabling quantitative comparison across strategies. Key findings show that Chord-Following Reinforcement Learning paired with Stepwise Changes achieves the highest mean payoffs, while Harmony Prediction—though learning-based—can produce unstable loops and high variance; non-RL baselines perform poorly, and RL strategies generally improve over time. These results offer a quantitative lens on improvisational strategy and motivate AI-assisted analysis and training on jazz solos to further refine reward structures and strategy adaptation in musical games.

Abstract

Live performances of music are always charming, with the unpredictability of improvisation due to the dynamic between musicians and interactions with the audience. Jazz improvisation is a particularly noteworthy example for further investigation from a theoretical perspective. Here, we introduce a novel mathematical game theory model for jazz improvisation, providing a framework for studying music theory and improvisational methodologies. We use computational modeling, mainly reinforcement learning, to explore diverse stochastic improvisational strategies and their paired performance on improvisation. We find that the most effective strategy pair is a strategy that reacts to the most recent payoff (Stepwise Changes) with a reinforcement learning strategy limited to notes in the given chord (Chord-Following Reinforcement Learning). Conversely, a strategy that reacts to the partner's last note and attempts to harmonize with it (Harmony Prediction) strategy pair yields the lowest non-control payoff and highest standard deviation, indicating that picking notes based on immediate reactions to the partner player can yield inconsistent outcomes. On average, the Chord-Following Reinforcement Learning strategy demonstrates the highest mean payoff, while Harmony Prediction exhibits the lowest. Our work lays the foundation for promising applications beyond jazz: including the use of artificial intelligence (AI) models to extract data from audio clips to refine musical reward systems, and training machine learning (ML) models on existing jazz solos to further refine strategies within the game.

Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory

TL;DR

and a harmony component

into

with a balancing factor

, enabling quantitative comparison across strategies. Key findings show that Chord-Following Reinforcement Learning paired with Stepwise Changes achieves the highest mean payoffs, while Harmony Prediction—though learning-based—can produce unstable loops and high variance; non-RL baselines perform poorly, and RL strategies generally improve over time. These results offer a quantitative lens on improvisational strategy and motivate AI-assisted analysis and training on jazz solos to further refine reward structures and strategy adaptation in musical games.

Abstract

Paper Structure (12 sections, 4 equations, 5 figures, 4 tables)

This paper contains 12 sections, 4 equations, 5 figures, 4 tables.

Introduction
Relevant Music Theory
Methodology and Model
Payoff Calculation
Variance Score (Diversity)
Harmony Score
Multiplication Factor
Improvisation Strategies
Results
Discussion
Limitations of the present approach
Potential Future Research Prospects and Outlook

Figures (5)

Figure 1: Heatmap Indicating Strategy-Strategy Payoffs. R: Randomness, CF: Chord Following, SF: Scale Following, HP: Harmony Prediction, SC: Stepwise Changes, SRL: Simple Reinforcement Learning, CFRL: Chord-Following Reinforcement Learning, CSR: Chord-Specific Reinforcement Learning, TPRL: Two-Player Reinforcement Learning
Figure 2: Average Payoff by Strategy. R: Randomness, CF: Chord Following, SF: Scale Following, HP: Harmony Prediction, SC: Stepwise Changes, SRL: Simple Reinforcement Learning, CFRL: Chord-Following Reinforcement Learning, CSR: Chord-Specific Reinforcement Learning, TPRL: Two-Player Reinforcement Learning
Figure 3: Network Graph Representing Payoff Relationships Between Strategies. R: Randomness, CF: Chord Following, SF: Scale Following, HP: Harmony Prediction, SC: Stepwise Changes, SRL: Simple Reinforcement Learning, CFRL: Chord-Following Reinforcement Learning, CSR: Chord-Specific Reinforcement Learning, TPRL: Two-Player Reinforcement Learning
Figure 4: Reinforcement Learning Payoffs over Time. Red Circles and Red Dashed Line: Simple Reinforcement Learning, Green Squares and Green Dash-Dot Line: Chord-Following Reinforcement Learning, Blue Diamonds and Blue Dotted Line: Chord-Specific Reinforcement Learning, Pink Triangles and Pink Solid Line: Two-Player Reinforcement Learning
Figure S1: QR Code for Chosen Samples of Generated Music

Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory

TL;DR

Abstract

Reinforcement Learning Jazz Improvisation: When Music Meets Game Theory

Authors

TL;DR

Abstract

Table of Contents

Figures (5)