Computational music analysis from first principles
Dmitri Tymoczko, Mark Newman
TL;DR
The paper addresses the long-standing ambiguity in Western harmonic analysis by demonstrating an unsupervised, two-stage hidden Markov framework that annotates a large Bach chorales corpus with chords and keys from Note sequences alone. By inferring chords from notes and then keys from chords, and by using random initializations with multiple restarts, the approach achieves chord- and key-level agreements with human analyses that are comparable to prior supervised methods. The resulting datasets reveal both familiar harmonic patterns and principled, data-driven insights into chord transitions, key modulations, and doubling practices, while also offering a neutral ground for testing theoretical claims about harmonic syntax. This work bridges traditional theory and data-driven analysis, delivering objective resources and suggesting directions for extending such methods to broader repertoires and styles while clarifying methodological assumptions.
Abstract
We use coupled hidden Markov models to automatically annotate the 371 Bach chorales in the Riemenschneider edition, a corpus containing approximately 100,000 notes and 20,000 chords. We give three separate analyses that achieve progressively greater accuracy at the cost of making increasingly strong assumptions about musical syntax. Although our method makes almost no use of human input, we are able to identify both chords and keys with an accuracy of 85% or greater when compared to an expert human analysis, resulting in annotations accurate enough to be used for a range of music-theoretical purposes, while also being free of subjective human judgments. Our work bears on longstanding debates about the objective reality of the structures postulated by standard Western harmonic theory, as well as on specific questions about the nature of Western harmonic syntax.
