LoRA Diffusion: Zero-Shot LoRA Synthesis for Diffusion Model Personalization
Ethan Smith, Rami Seid, Alberto Hojel, Paramita Mishra, Jianbo Wu
TL;DR
Addressing the need for fast, zero-shot personalization of diffusion models, this work defines a low-dimensional manifold $M\subset \mathbb{R}^N$ of LoRA parameters with dimension $R \ll N$ and trains a generative model to sample new LoRAs conditioned on domain cues. It introduces a VAE-based latent encoding of LoRA vectors (with $m=512$) and leverages diffusion with $x_0$- and $v$-predictions, showing that VAE latents and Gaussian priors yield superior reconstruction and conditioning fidelity. The proposed ADALoRA conditioning mechanism further improves attribute control, achieving about a 30% gain in ArcFace similarity over AdaNorm. Together, these components enable near-instantaneous LoRA synthesis for personalized diffusion outputs, reducing training costs while preserving identity fidelity and extending rapid adaptation to broader content domains.
Abstract
Low-Rank Adaptation (LoRA) and other parameter-efficient fine-tuning (PEFT) methods provide low-memory, storage-efficient solutions for personalizing text-to-image models. However, these methods offer little to no improvement in wall-clock training time or the number of steps needed for convergence compared to full model fine-tuning. While PEFT methods assume that shifts in generated distributions (from base to fine-tuned models) can be effectively modeled through weight changes in a low-rank subspace, they fail to leverage knowledge of common use cases, which typically focus on capturing specific styles or identities. Observing that desired outputs often comprise only a small subset of the possible domain covered by LoRA training, we propose reducing the search space by incorporating a prior over regions of interest. We demonstrate that training a hypernetwork model to generate LoRA weights can achieve competitive quality for specific domains while enabling near-instantaneous conditioning on user input, in contrast to traditional training methods that require thousands of steps.
