SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation

Hongrui Wang; Fan Zhang; Zhiyuan Yu; Ziya Zhou; Xi Chen; Can Yang; Yang Wang

SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation

Hongrui Wang, Fan Zhang, Zhiyuan Yu, Ziya Zhou, Xi Chen, Can Yang, Yang Wang

TL;DR

Experiments demonstrate that SyncTrack significantly improves the multi-track music quality by enhancing rhythmic consistency through three novel metrics: Inner-track Rhythmic Stability (IRS), Cross-track Beat Synchronization (CBS), and Cross-track Beat Dispersion (CBD).

Abstract

Multi-track music generation has garnered significant research interest due to its precise mixing and remixing capabilities. However, existing models often overlook essential attributes such as rhythmic stability and synchronization, leading to a focus on differences between tracks rather than their inherent properties. In this paper, we introduce SyncTrack, a synchronous multi-track waveform music generation model designed to capture the unique characteristics of multi-track music. SyncTrack features a novel architecture that includes track-shared modules to establish a common rhythm across all tracks and track-specific modules to accommodate diverse timbres and pitch ranges. Each track-shared module employs two cross-track attention mechanisms to synchronize rhythmic information, while each track-specific module utilizes learnable instrument priors to better represent timbre and other unique features. Additionally, we enhance the evaluation of multi-track music quality by introducing rhythmic consistency through three novel metrics: Inner-track Rhythmic Stability (IRS), Cross-track Beat Synchronization (CBS), and Cross-track Beat Dispersion (CBD). Experiments demonstrate that SyncTrack significantly improves the multi-track music quality by enhancing rhythmic consistency.

SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation

TL;DR

Abstract

Paper Structure (26 sections, 8 equations, 12 figures, 12 tables)

This paper contains 26 sections, 8 equations, 12 figures, 12 tables.

Introduction
Related Work
Rhythmic Stability and Synchronization
Audio Generation and Multi-Track Waveform Music Generation
Synchronized Generation in Audio Related Domains
Objective Evaluation Metrics for Multi-Track Music Generation
SyncTrack: Synchronous Multi-track Music Generation Model
Metrics for Rhythmic Stability and Synchronization
Experiments
Experimental Setting.
Quality of Generated Multi-track music (RQ1)
Rhythmic Stability and Synchronization (RQ2)
Ablation Study (RQ3)
ROBUSTNESS AND SUBJECTIVE VALIDATION OF EVALUATION METRICS(RQ4)
Conclusion
...and 11 more sections

Figures (12)

Figure 1: (a) Previous methods mariani2023multikarchkhadze2025simultaneous leverage a unified model to learn the joint distribution of multi-track audio stems. (b) While our proposed SyncTrack incorporates both track-shared modules and track-specific modules for common and specific information between tracks.
Figure 2: a. Overall pipeline for SyncTrack. Training pipeline: We train a four-track latent diffusion model. Each track is perturbed based on $l$-th signal-to-noise ratio. The model is optimized to predict the added noise $\epsilon\in \mathcal{N}(0,I)$. More details are in Section 3.1. Inference pipeline: At test time, four-track latents are generated and then decoded into audio data. b. SyncTrack consists of input, mid, and output blocks, which contains track-specific modules and track-shared modules.
Figure 3: Illustration of the (a) track-shared module and (b) track-specific module. In (a), we leverage inner-track attention to capture the inner-track rhythmic stability and devise (c) two cross-track attention submodules to capture cross-track rhythmic stability and synchronization. In (b), we construct a learnable instrument prior to capture timbre and other track-specific features.
Figure 4: Comparison of subjective ratings and objective metric scores.
Figure A1: Comparison of IRS across hyperparameter settings,
...and 7 more figures

SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation

TL;DR

Abstract

SyncTrack: Rhythmic Stability and Synchronization in Multi-Track Music Generation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)