Source Separation of Multi-source Raw Music using a Residual Quantized Variational Autoencoder

Leonardo Berti

Source Separation of Multi-source Raw Music using a Residual Quantized Variational Autoencoder

Leonardo Berti

TL;DR

This work introduces a Residual Quantized Variational Autoencoder (RQ-VAE) for source separation of multi-source raw music, demonstrated on Slakh2100. By compressing audio into hierarchical discrete latent codes and enabling single-step inference, the model approaches state-of-the-art separation performance with far lower computational cost than existing high-step methods. The approach also explores an RQTransformer to generate music in the latent space, though generation quality was not yet satisfactory. Overall, the RQ-VAE provides a efficient neural audio codec capable of effective separation, with potential for broader applications in audio generation and compression.

Abstract

I developed a neural audio codec model based on the residual quantized variational autoencoder architecture. I train the model on the Slakh2100 dataset, a standard dataset for musical source separation, composed of multi-track audio. The model can separate audio sources, achieving almost SoTA results with much less computing power. The code is publicly available at github.com/LeonardoBerti00/Source-Separation-of-Multi-source-Music-using-Residual-Quantizad-Variational-Autoencoder

Source Separation of Multi-source Raw Music using a Residual Quantized Variational Autoencoder

TL;DR

Abstract

Source Separation of Multi-source Raw Music using a Residual Quantized Variational Autoencoder

Authors

TL;DR

Abstract

Table of Contents