LVNS-RAVE: Diversified audio generation with RAVE and Latent Vector Novelty Search
Jinyue Guo, Anna-Maria Christodoulou, Balint Laczko, Kyrre Glette
TL;DR
The paper tackles the tension between realism and creativity in audio generation by combining a deep generative model (RAVE) with Latent Variable Evolution guided by a novelty objective. It evolves latent-space vectors $s_i \\in\mathbb{R}^{d\times l}$ decoded by $RDec$ to audio, and uses a VGGish-based perceptual distance $dist(s_i,s_j)=\left\| VGG\text{ish}(RDec(s_i)) - VGG\text{ish}(RDec(s_j))\right\|$ to compute novelty $\rho(s_i)=\frac{1}{k}\sum_{j\in U} dist(s_i,s_j)$ with $k=50$, selecting $N$ samples that maximize total novelty. By initializing, crossbreeding, mutating, and selecting in the RAVE latent space, the method seeks diverse yet realistic outputs, evaluated across three pre-trained RAVE models and multiple setups. The approach demonstrates increased diversity over generations and provides a controllable framework for artists to balance fidelity and novelty, advancing creative audio generation. Overall, LVNS-RAVE offers a principled pathway to unleash novelty in deep audio synthesis while maintaining perceptual quality.
Abstract
Evolutionary Algorithms and Generative Deep Learning have been two of the most powerful tools for sound generation tasks. However, they have limitations: Evolutionary Algorithms require complicated designs, posing challenges in control and achieving realistic sound generation. Generative Deep Learning models often copy from the dataset and lack creativity. In this paper, we propose LVNS-RAVE, a method to combine Evolutionary Algorithms and Generative Deep Learning to produce realistic and novel sounds. We use the RAVE model as the sound generator and the VGGish model as a novelty evaluator in the Latent Vector Novelty Search (LVNS) algorithm. The reported experiments show that the method can successfully generate diversified, novel audio samples under different mutation setups using different pre-trained RAVE models. The characteristics of the generation process can be easily controlled with the mutation parameters. The proposed algorithm can be a creative tool for sound artists and musicians.
