Table of Contents
Fetching ...

Structured Multi-Track Accompaniment Arrangement via Style Prior Modelling

Jingwei Zhao, Gus Xia, Ziyu Wang, Ye Wang

TL;DR

A novel system that leverages prior modelling over disentangled style factors to address challenges of arranging rich and structured multi-track accompaniments from a simple lead sheet and achieves superior coherence, structure, and overall arrangement quality compared to existing baselines is introduced.

Abstract

In the realm of music AI, arranging rich and structured multi-track accompaniments from a simple lead sheet presents significant challenges. Such challenges include maintaining track cohesion, ensuring long-term coherence, and optimizing computational efficiency. In this paper, we introduce a novel system that leverages prior modelling over disentangled style factors to address these challenges. Our method presents a two-stage process: initially, a piano arrangement is derived from the lead sheet by retrieving piano texture styles; subsequently, a multi-track orchestration is generated by infusing orchestral function styles into the piano arrangement. Our key design is the use of vector quantization and a unique multi-stream Transformer to model the long-term flow of the orchestration style, which enables flexible, controllable, and structured music generation. Experiments show that by factorizing the arrangement task into interpretable sub-stages, our approach enhances generative capacity while improving efficiency. Additionally, our system supports a variety of music genres and provides style control at different composition hierarchies. We further show that our system achieves superior coherence, structure, and overall arrangement quality compared to existing baselines.

Structured Multi-Track Accompaniment Arrangement via Style Prior Modelling

TL;DR

A novel system that leverages prior modelling over disentangled style factors to address challenges of arranging rich and structured multi-track accompaniments from a simple lead sheet and achieves superior coherence, structure, and overall arrangement quality compared to existing baselines is introduced.

Abstract

In the realm of music AI, arranging rich and structured multi-track accompaniments from a simple lead sheet presents significant challenges. Such challenges include maintaining track cohesion, ensuring long-term coherence, and optimizing computational efficiency. In this paper, we introduce a novel system that leverages prior modelling over disentangled style factors to address these challenges. Our method presents a two-stage process: initially, a piano arrangement is derived from the lead sheet by retrieving piano texture styles; subsequently, a multi-track orchestration is generated by infusing orchestral function styles into the piano arrangement. Our key design is the use of vector quantization and a unique multi-stream Transformer to model the long-term flow of the orchestration style, which enables flexible, controllable, and structured music generation. Experiments show that by factorizing the arrangement task into interpretable sub-stages, our approach enhances generative capacity while improving efficiency. Additionally, our system supports a variety of music genres and provides style control at different composition hierarchies. We further show that our system achieves superior coherence, structure, and overall arrangement quality compared to existing baselines.
Paper Structure (34 sections, 9 equations, 15 figures, 6 tables)

This paper contains 34 sections, 9 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: The autoencoder architecture. It learns content representation $\mathbf{c}_t$ from piano reduction, style representations $\mathbf{s}_t^{1:K}$ from orchestral function, and leverages both to reconstruct individual tracks.
  • Figure 2: The prior model architecture. The overall architecture is an encoder-decoder Transformer, while the decoder module is interleaved with orthogonal time-wise and track-wise layers.
  • Figure 3: A complete accompaniment arrangement system based on cascaded prior modelling. The first stage models piano texture style given lead sheet while the second stage models orchestral function style given piano. Besides modularity, the system offers control on both composition levels.
  • Figure 4: Arrangement for Can You Feel the Love Tonight, a pop song in a total of 60 bars. We show two chorus parts from bar 13 to 41. We use red dotted boxes to show coherence in long-term structure. We use coloured blocks to show naturalness and cohesion in multi-track arrangement.
  • Figure 5: Subjective evaluation results on lead sheet to multi-track arrangement (Section \ref{['subject_arr_section']}).
  • ...and 10 more figures