Table of Contents
Fetching ...

Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation

Mingkang Zhu, Xi Chen, Zhongdao Wang, Bei Yu, Hengshuang Zhao, Jiaya Jia

TL;DR

The paper tackles modular, scalable diffusion-model customization to merge multiple user-trained concepts without erasing their identities. It identifies identity interference and identity loss as key obstacles and introduces BlockLoRA, which combines Randomized Output Erasure and Blockwise LoRA Parameterization to enforce niche concept representations and disjoint parameter updates. This enables instant merging of up to 15 concepts with high fidelity, demonstrated through extensive CLIP-based evaluations and qualitative analyses against strong baselines. The approach offers a plug-and-play pathway for multi-concept customization and concept stylization in diffusion models, with practical implications for decentralized concept sharing and collaboration.

Abstract

Recent diffusion model customization has shown impressive results in incorporating subject or style concepts with a handful of images. However, the modular composition of multiple concepts into a customized model, aimed to efficiently merge decentralized-trained concepts without influencing their identities, remains unresolved. Modular customization is essential for applications like concept stylization and multi-concept customization using concepts trained by different users. Existing post-training methods are only confined to a fixed set of concepts, and any different combinations require a new round of retraining. In contrast, instant merging methods often cause identity loss and interference of individual merged concepts and are usually limited to a small number of concepts. To address these issues, we propose BlockLoRA, an instant merging method designed to efficiently combine multiple concepts while accurately preserving individual concepts' identity. With a careful analysis of the underlying reason for interference, we develop the Randomized Output Erasure technique to minimize the interference of different customized models. Additionally, Blockwise LoRA Parameterization is proposed to reduce the identity loss during instant model merging. Extensive experiments validate the effectiveness of BlockLoRA, which can instantly merge 15 concepts of people, subjects, scenes, and styles with high fidelity.

Modular Customization of Diffusion Models via Blockwise-Parameterized Low-Rank Adaptation

TL;DR

The paper tackles modular, scalable diffusion-model customization to merge multiple user-trained concepts without erasing their identities. It identifies identity interference and identity loss as key obstacles and introduces BlockLoRA, which combines Randomized Output Erasure and Blockwise LoRA Parameterization to enforce niche concept representations and disjoint parameter updates. This enables instant merging of up to 15 concepts with high fidelity, demonstrated through extensive CLIP-based evaluations and qualitative analyses against strong baselines. The approach offers a plug-and-play pathway for multi-concept customization and concept stylization in diffusion models, with practical implications for decentralized concept sharing and collaboration.

Abstract

Recent diffusion model customization has shown impressive results in incorporating subject or style concepts with a handful of images. However, the modular composition of multiple concepts into a customized model, aimed to efficiently merge decentralized-trained concepts without influencing their identities, remains unresolved. Modular customization is essential for applications like concept stylization and multi-concept customization using concepts trained by different users. Existing post-training methods are only confined to a fixed set of concepts, and any different combinations require a new round of retraining. In contrast, instant merging methods often cause identity loss and interference of individual merged concepts and are usually limited to a small number of concepts. To address these issues, we propose BlockLoRA, an instant merging method designed to efficiently combine multiple concepts while accurately preserving individual concepts' identity. With a careful analysis of the underlying reason for interference, we develop the Randomized Output Erasure technique to minimize the interference of different customized models. Additionally, Blockwise LoRA Parameterization is proposed to reduce the identity loss during instant model merging. Extensive experiments validate the effectiveness of BlockLoRA, which can instantly merge 15 concepts of people, subjects, scenes, and styles with high fidelity.

Paper Structure

This paper contains 11 sections, 7 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Comparison of two main approaches and ours for merging multiple concepts into a single pre-trained model. Our method supports instant merging while not hurting identities.
  • Figure 2: Instant merging 15 concepts without hurting identities. Given a set of concepts of characters, objects, scenes, or styles, our BlockLoRA can integrate any subsets of their identities instantly and accurately into pre-trained diffusion models without identity loss or interference. Thus meaningful interactions among merged concepts can be achieved smoothly.
  • Figure 3: Visualization of prior class distribution drift: (a) Pre-trained model's prior class generations; (b) Customized models' concept generations and prior class generations; (c) Merged model's concept generations and prior class generations. The prior class generations are significantly drifted towards concepts after fine-tuning. Both the prior class generations and concept identities are entangled after merging.
  • Figure 4: Visualization of direction interference and sign conflicts. Left: Layer-wise average cosine similarity for every combination of models in 10 customized models; Right: Fraction of parameters with sign conflicts when the number of customized models increases from 2 to 10.
  • Figure 5: Comparison of identity preservation ability for instant merging methods when the number of merged concepts increases from 1 to 15. Our method only has marginal identity loss while others suffer greatly.
  • ...and 2 more figures