Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls
Liwei Lin, Gus Xia, Yixiao Zhang, Junyan Jiang
TL;DR
The paper tackles the gap in long-range, controllable music editing for autoregressive models by introducing AIRGen, a parameter-efficient heterogeneous adapter that turns MusicGen into a masked LM capable of inpainting, arrangement, and refinement. It combines a novel four-adapter per layer design with a masking training scheme and frame-level content-based controls to enable drum conditioning, chord progressions, and piano-cover conditioning, while keeping the majority of the base model frozen. Experiments on Slakh2100 and RWC-POP100 demonstrate competitive inpainting quality, strong steerability, and efficient fine-tuning with lightweight adapters, including favorable long-gap performance and robustness to masking patterns. The work advances practical, steerable long-term music editing with reduced computational costs and lays groundwork for richer, content-based control in AI-driven music tools.
Abstract
Controllable music generation plays a vital role in human-AI music co-creation. While Large Language Models (LLMs) have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To address this gap, we propose a novel approach leveraging a parameter-efficient heterogeneous adapter combined with a masking training scheme. This approach enables autoregressive language models to seamlessly address music inpainting tasks. Additionally, our method integrates frame-level content-based controls, facilitating track-conditioned music refinement and score-conditioned music arrangement. We apply this method to fine-tune MusicGen, a leading autoregressive music generation model. Our experiments demonstrate promising results across multiple music editing tasks, offering more flexible controls for future AI-driven music editing tools. The source codes and a demo page showcasing our work are available at https://kikyo-16.github.io/AIR.
