Table of Contents
Fetching ...

JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning

Boyu Chen, Peike Li, Yao Yao, Alex Wang

TL;DR

The paper tackles customized text-to-music generation by enabling a model to learn a user-specific concept from a short reference track, addressing overfitting and concept conflicts. It introduces Pivotal Parameters Tuning to selectively fine-tune a sparse set of parameters and uses multiple trainable tokens $V^*$ to support multi-concept prompts, alongside a concept enhancement strategy and a new evaluation protocol. A data-efficient dataset and baseline comparisons demonstrate that Jen-1 DreamStyler achieves superior qualitative and quantitative performance over existing methods. The approach enables practical, concept-driven music generation from minimal input, with broad potential for personalized musical creativity.

Abstract

Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper, we propose a novel method for customized text-to-music generation, which can capture the concept from a two-minute reference music and generate a new piece of music conforming to the concept. We achieve this by fine-tuning a pretrained text-to-music model using the reference music. However, directly fine-tuning all parameters leads to overfitting issues. To address this problem, we propose a Pivotal Parameters Tuning method that enables the model to assimilate the new concept while preserving its original generative capabilities. Additionally, we identify a potential concept conflict when introducing multiple concepts into the pretrained model. We present a concept enhancement strategy to distinguish multiple concepts, enabling the fine-tuned model to generate music incorporating either individual or multiple concepts simultaneously. Since we are the first to work on the customized music generation task, we also introduce a new dataset and evaluation protocol for the new task. Our proposed Jen1-DreamStyler outperforms several baselines in both qualitative and quantitative evaluations. Demos will be available at https://www.jenmusic.ai/research#DreamStyler.

JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning

TL;DR

The paper tackles customized text-to-music generation by enabling a model to learn a user-specific concept from a short reference track, addressing overfitting and concept conflicts. It introduces Pivotal Parameters Tuning to selectively fine-tune a sparse set of parameters and uses multiple trainable tokens to support multi-concept prompts, alongside a concept enhancement strategy and a new evaluation protocol. A data-efficient dataset and baseline comparisons demonstrate that Jen-1 DreamStyler achieves superior qualitative and quantitative performance over existing methods. The approach enables practical, concept-driven music generation from minimal input, with broad potential for personalized musical creativity.

Abstract

Large models for text-to-music generation have achieved significant progress, facilitating the creation of high-quality and varied musical compositions from provided text prompts. However, input text prompts may not precisely capture user requirements, particularly when the objective is to generate music that embodies a specific concept derived from a designated reference collection. In this paper, we propose a novel method for customized text-to-music generation, which can capture the concept from a two-minute reference music and generate a new piece of music conforming to the concept. We achieve this by fine-tuning a pretrained text-to-music model using the reference music. However, directly fine-tuning all parameters leads to overfitting issues. To address this problem, we propose a Pivotal Parameters Tuning method that enables the model to assimilate the new concept while preserving its original generative capabilities. Additionally, we identify a potential concept conflict when introducing multiple concepts into the pretrained model. We present a concept enhancement strategy to distinguish multiple concepts, enabling the fine-tuned model to generate music incorporating either individual or multiple concepts simultaneously. Since we are the first to work on the customized music generation task, we also introduce a new dataset and evaluation protocol for the new task. Our proposed Jen1-DreamStyler outperforms several baselines in both qualitative and quantitative evaluations. Demos will be available at https://www.jenmusic.ai/research#DreamStyler.
Paper Structure (6 sections, 6 equations, 3 figures)

This paper contains 6 sections, 6 equations, 3 figures.

Figures (3)

  • Figure 1: Utilizing a mere two minutes of reference music representing a new concept, our proposed JEN-1 DreamStyler can understand and reproduce the musical concept. Reference musical concepts could be an instrument (e.g. guitar), a genre (e.g., jazz), etc. Our JEN-1 DreamStyler is not limited to mastering a single musical concept, but also proficient in simultaneously integrating and generalizing multiple musical concepts.
  • Figure 2: Given reference music of novel musical concepts, we select and fine-tune the most pivotal parameters within the U-Net module of our text-to-music diffusion model. Furthermore, to enhance its discriminative capabilities, we introduce several trainable concept identifier tokens, denoted as V$^*$, to present these new concepts. During training, we efficiently tune these pivotal value projection parameters in the self-attention layers and all key and value projection parameters in the cross-attention layers, in conjunction with the concept identifier tokens. For simplicity, we only illustrate scenarios involving the learning of a single musical concept.
  • Figure :