Table of Contents
Fetching ...

DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion

Yujia Wu, Yiming Shi, Jiwei Wei, Chengwei Sun, Yang Yang, Heng Tao Shen

TL;DR

Comprehensive experimental results demonstrate that DiffLoRA outperforms existing personalization approaches across multiple benchmarks, achieving both time efficiency and maintaining identity fidelity throughout the personalization process.

Abstract

Personalized text-to-image generation has gained significant attention for its capability to generate high-fidelity portraits of specific identities conditioned on user-defined prompts. Existing methods typically involve test-time fine-tuning or incorporating an additional pre-trained branch. However, these approaches struggle to simultaneously address efficiency, identity fidelity, and the preservation of the model's original generative capabilities. In this paper, we propose DiffLoRA, an efficient method that leverages the diffusion model as a hypernetwork to predict personalized Low-Rank Adaptation (LoRA) weights based on the reference images. By incorporating these LoRA weights into the off-the-shelf text-to-image model, DiffLoRA enables zero-shot personalization during inference, eliminating the need for post-processing optimization. Moreover, we introduce a novel identity-oriented LoRA weights construction pipeline to facilitate the training process of DiffLoRA. The dataset generated through this pipeline enables DiffLoRA to produce consistently high-quality LoRA weights. Notably, the distinctive properties of the diffusion model enhance the generation of superior weights by employing probabilistic modeling to capture intricate structural patterns and thoroughly explore the weight space. Comprehensive experimental results demonstrate that DiffLoRA outperforms existing personalization approaches across multiple benchmarks, achieving both time efficiency and maintaining identity fidelity throughout the personalization process.

DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion

TL;DR

Comprehensive experimental results demonstrate that DiffLoRA outperforms existing personalization approaches across multiple benchmarks, achieving both time efficiency and maintaining identity fidelity throughout the personalization process.

Abstract

Personalized text-to-image generation has gained significant attention for its capability to generate high-fidelity portraits of specific identities conditioned on user-defined prompts. Existing methods typically involve test-time fine-tuning or incorporating an additional pre-trained branch. However, these approaches struggle to simultaneously address efficiency, identity fidelity, and the preservation of the model's original generative capabilities. In this paper, we propose DiffLoRA, an efficient method that leverages the diffusion model as a hypernetwork to predict personalized Low-Rank Adaptation (LoRA) weights based on the reference images. By incorporating these LoRA weights into the off-the-shelf text-to-image model, DiffLoRA enables zero-shot personalization during inference, eliminating the need for post-processing optimization. Moreover, we introduce a novel identity-oriented LoRA weights construction pipeline to facilitate the training process of DiffLoRA. The dataset generated through this pipeline enables DiffLoRA to produce consistently high-quality LoRA weights. Notably, the distinctive properties of the diffusion model enhance the generation of superior weights by employing probabilistic modeling to capture intricate structural patterns and thoroughly explore the weight space. Comprehensive experimental results demonstrate that DiffLoRA outperforms existing personalization approaches across multiple benchmarks, achieving both time efficiency and maintaining identity fidelity throughout the personalization process.
Paper Structure (19 sections, 6 equations, 8 figures, 3 tables)

This paper contains 19 sections, 6 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: DiffLoRA generates personalized LoRA weights based on reference images, enabling personalized image synthesis through directly merging these weights into the off-the-shelf text-to-image model.
  • Figure 2: A toy experiment demonstrating superior compression and reconstruction performance of low-rank features.
  • Figure 3: Overview of DiffLoRA. We begin by encoding and reconstructing the LoRA weights using the LoRA Weights Autoencoder (LAE) to compress them into latent representations. In the diffusion process, these noisy latent representations are processed by a diffusion transformer conditioned on Mixed Image Features (MIF), integrating both facial and image features from reference images. During inference, the diffusion model takes random noise and reference images as input to generate personalized LoRA weights.
  • Figure 4: An example illustrating the importance of large LoRA weights in preserving identity information.
  • Figure 5: Pipeline for constructing LoRA weights dataset.
  • ...and 3 more figures