Incorporating Structure and Chord Constraints in Symbolic Transformer-based Melodic Harmonization
Maximos Kaliakatsos-Papakostas, Konstantinos Soiledis, Theodoros Tsamis, Dimos Makris, Vassilis Katsouros, Emilios Cambouropoulos
TL;DR
The paper tackles enforcing hard chord constraints in transformer-based melodic harmonization. It introduces B*, a beam-search/A*-hybrid algorithm with backtracking to guarantee constraint satisfaction, and compares two tokenization schemes (ChordSymbolTokenizer and PitchClassTokenizer) across BART and GPT-2 on Hook Theory lead sheets transposed to C major/A minor. It shows that naive constraint prompts are often ignored by standard transformers, while B* achieves constraint satisfaction in the majority of tested cases within a practical model-call budget; soft constraints help but do not solve hard-constraint enforcement. The work highlights important trade-offs between tokenization granularity, search complexity, and musical coherence, and points to future improvements through heuristics and learned evaluators to accelerate constraint-driven harmonization.
Abstract
Transformer architectures offer significant advantages regarding the generation of symbolic music; their capabilities for incorporating user preferences toward what they generate is being studied under many aspects. This paper studies the inclusion of predefined chord constraints in melodic harmonization, i.e., where a desired chord at a specific location is provided along with the melody as inputs and the autoregressive transformer model needs to incorporate the chord in the harmonization that it generates. The peculiarities of involving such constraints is discussed and an algorithm is proposed for tackling this task. This algorithm is called B* and it combines aspects of beam search and A* along with backtracking to force pretrained transformers to satisfy the chord constraints, at the correct onset position within the correct bar. The algorithm is brute-force and has exponential complexity in the worst case; however, this paper is a first attempt to highlight the difficulties of the problem and proposes an algorithm that offers many possibilities for improvements since it accommodates the involvement of heuristics.
