ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Viraj Shah; Nataniel Ruiz; Forrester Cole; Erika Lu; Svetlana Lazebnik; Yuanzhen Li; Varun Jampani

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Viraj Shah, Nataniel Ruiz, Forrester Cole, Erika Lu, Svetlana Lazebnik, Yuanzhen Li, Varun Jampani

TL;DR

ZipLoRA tackles the challenge of generating a specific subject in any given style by merging independently trained subject and style LoRAs. It introduces a lightweight, hyperparameter-free optimization that learns per-column merger coefficients to minimize interference between LoRAs, while preserving each LoRA's original capabilities. The method relies on three key insights: sparsity of LoRA updates, strong single-exemplar style learning in SDXL, and the detrimental effect of highly aligned LoRA columns when naively merged. Empirically, ZipLoRA outperforms direct merges and joint training across diverse subject-style pairs, with favorable user preferences and robust recontextualization and style controllability, all while requiring only about 100 gradient steps. This approach offers a scalable, practical path to flexible diffusion-model personalization without extensive retraining or hyperparameter tuning.

Abstract

Methods for finetuning generative models for concept-driven personalization generally achieve strong results for subject-driven or style-driven generation. Recently, low-rank adaptations (LoRA) have been proposed as a parameter-efficient way of achieving concept-driven personalization. While recent work explores the combination of separate LoRAs to achieve joint generation of learned styles and subjects, existing techniques do not reliably address the problem; they often compromise either subject fidelity or style fidelity. We propose ZipLoRA, a method to cheaply and effectively merge independently trained style and subject LoRAs in order to achieve generation of any user-provided subject in any user-provided style. Experiments on a wide range of subject and style combinations show that ZipLoRA can generate compelling results with meaningful improvements over baselines in subject and style fidelity while preserving the ability to recontextualize. Project page: https://ziplora.github.io

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

TL;DR

Abstract

Paper Structure (9 sections, 3 equations, 9 figures, 2 tables)

This paper contains 9 sections, 3 equations, 9 figures, 2 tables.

Introduction
Related Work
Methods
Background
ZipLoRA
Experiments
Style-tuning behavior of SDXL model
Personalized Stylizations
Conclusion

Figures (9)

Figure 1: By effectively merging independently trained style and content LoRAs, our proposed method ZipLoRA is able to generate any user-provided subject in any user-provided style, providing unprecedented control over personalized creations using diffusion models.
Figure 2: LoRA weight matrices are sparse. Most of the elements in $\Delta W$ have a magnitude very close to zero, and can be conveniently thrown away without affecting the generation quality of the fine-tuned model.
Figure 3: Highly aligned LoRA weights merge poorly. When LoRA weight columns are highly aligned, a direct merge obtains subpar results. Instead, our approach minimizes the mean cosine similarity between the columns of the LoRA updates across the layers.
Figure 4: Overview of ZipLoRA. Our method learns mixing coefficients for each column of $\Delta W_i$ for both style and subject LoRAs. It does so by (1) minimizing the difference between subject/style images generated by the mixed LoRA and original subject/style LoRA models, while (2) minimizing the cosine similarity between the columns of content and style LoRAs. In essence, the zipped LoRA tries to conserve the subject and style properties of each individual LoRA, while minimizing signal interference of both LoRAs.
Figure 5: Style Learning using DreamBooth on SDXL. Top: SDXL model learns to produce stylized outputs when fine-tuned on a single example of a reference style using LoRA with a DreamBooth objective. Bottom: The stylizations produced by fine-tuned SDXL model are highly competent, compared to those of other models. Note that unlike StyleDrop, SDXL DreamBooth fine-tuning does not require human feedback.
...and 4 more figures

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

TL;DR

Abstract

ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs

Authors

TL;DR

Abstract

Table of Contents

Figures (9)