Table of Contents
Fetching ...

Explicit Tonal Tension Conditioning via Dual-Level Beam Search for Symbolic Music Generation

Maral Ebrahimzadeh, Gilberto Bernardes, Sebastian Stober

TL;DR

This work tackles explicit tonal tension control in symbolic music generation by integrating a Tonal Interval Vector (TIV)–based tonal tension metric into a Transformer framework and introducing a dual-level beam search. The token-level stage re-ranks candidates for quality and diversity, while the bar-level stage aligns completed bars with a user-specified tension curve, using a tension similarity measure to guide selection. Objective and subjective evaluations demonstrate significant improvements in tension alignment without sacrificing other musical qualities, and the method enables multiple distinct interpretations under the same tension condition. The approach offers a practical inference-time control mechanism that can be extended to broader expressivity and compositional targets in AI-assisted music creation.

Abstract

State-of-the-art symbolic music generation models have recently achieved remarkable output quality, yet explicit control over compositional features, such as tonal tension, remains challenging. We propose a novel approach that integrates a computational tonal tension model, based on tonal interval vector analysis, into a Transformer framework. Our method employs a two-level beam search strategy during inference. At the token level, generated candidates are re-ranked using model probability and diversity metrics to maintain overall quality. At the bar level, a tension-based re-ranking is applied to ensure that the generated music aligns with a desired tension curve. Objective evaluations indicate that our approach effectively modulates tonal tension, and subjective listening tests confirm that the system produces outputs that align with the target tension. These results demonstrate that explicit tension conditioning through a dual-level beam search provides a powerful and intuitive tool to guide AI-generated music. Furthermore, our experiments demonstrate that our method can generate multiple distinct musical interpretations under the same tension condition.

Explicit Tonal Tension Conditioning via Dual-Level Beam Search for Symbolic Music Generation

TL;DR

This work tackles explicit tonal tension control in symbolic music generation by integrating a Tonal Interval Vector (TIV)–based tonal tension metric into a Transformer framework and introducing a dual-level beam search. The token-level stage re-ranks candidates for quality and diversity, while the bar-level stage aligns completed bars with a user-specified tension curve, using a tension similarity measure to guide selection. Objective and subjective evaluations demonstrate significant improvements in tension alignment without sacrificing other musical qualities, and the method enables multiple distinct interpretations under the same tension condition. The approach offers a practical inference-time control mechanism that can be extended to broader expressivity and compositional targets in AI-assisted music creation.

Abstract

State-of-the-art symbolic music generation models have recently achieved remarkable output quality, yet explicit control over compositional features, such as tonal tension, remains challenging. We propose a novel approach that integrates a computational tonal tension model, based on tonal interval vector analysis, into a Transformer framework. Our method employs a two-level beam search strategy during inference. At the token level, generated candidates are re-ranked using model probability and diversity metrics to maintain overall quality. At the bar level, a tension-based re-ranking is applied to ensure that the generated music aligns with a desired tension curve. Objective evaluations indicate that our approach effectively modulates tonal tension, and subjective listening tests confirm that the system produces outputs that align with the target tension. These results demonstrate that explicit tension conditioning through a dual-level beam search provides a powerful and intuitive tool to guide AI-generated music. Furthermore, our experiments demonstrate that our method can generate multiple distinct musical interpretations under the same tension condition.

Paper Structure

This paper contains 16 sections, 4 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: The five tension curves used in the listening study.
  • Figure 2: Confusion matrix of listener identification of tonal tension curves (correct answers outlined in red). Curves are listed on the y-axis, and samples appear on the x-axis. Curve-to-sample assignments: Curve 1 (Samples B, H), Curve 2 (Samples C, J), Curve 3 (Samples F, I), Curve 4 (Samples A, D), Curve 5 (Samples E, G).