VampNet: Music Generation via Masked Acoustic Token Modeling
Hugo Flores Garcia, Prem Seetharaman, Rithesh Kumar, Bryan Pardo
TL;DR
VampNet addresses the need for fast, flexible music generation beyond autoregressive models by using masked acoustic token modeling with parallel iterative decoding. It combines a DAC-based audio tokenizer with two-stage, bidirectional transformers to predict masked token sequences, enabling both compression and creative variation through token-based prompts. The key contributions are the Masked Acoustic Token Modeling framework, a variable masking training schedule, a confidence-based sampling loop, and a suite of prompting strategies including beat-driven and periodic prompts, which can interpolate between faithful reconstruction and generation. Empirical results show that VampNet achieves coherent high-fidelity audio with as few as 36 sampling passes, with beat-driven prompts yielding the best FAD and the approach enabling real-time-like generation relative to autoregressive baselines, suggesting practical applicability for interactive music co-creation.
Abstract
We introduce VampNet, a masked acoustic token modeling approach to music synthesis, compression, inpainting, and variation. We use a variable masking schedule during training which allows us to sample coherent music from the model by applying a variety of masking approaches (called prompts) during inference. VampNet is non-autoregressive, leveraging a bidirectional transformer architecture that attends to all tokens in a forward pass. With just 36 sampling passes, VampNet can generate coherent high-fidelity musical waveforms. We show that by prompting VampNet in various ways, we can apply it to tasks like music compression, inpainting, outpainting, continuation, and looping with variation (vamping). Appropriately prompted, VampNet is capable of maintaining style, genre, instrumentation, and other high-level aspects of the music. This flexible prompting capability makes VampNet a powerful music co-creation tool. Code and audio samples are available online.
