Finetuning-Free Personalization of Text to Image Generation via Hypernetworks

Sagar Shrestha; Gopal Sharma; Luowei Zhou; Suren Kumar

Finetuning-Free Personalization of Text to Image Generation via Hypernetworks

Sagar Shrestha, Gopal Sharma, Luowei Zhou, Suren Kumar

TL;DR

The paper tackles the problem of personalizing text-to-image diffusion with minimal overhead by introducing an end-to-end hypernetwork that predicts LoRA adapters directly from subject images for a frozen diffusion backbone. A simple $L_2$ regularization on the hypernetwork output stabilizes training and prevents overfitting, enabling reliable per-subject personalization without test-time optimization. It further proposes Hybrid Model Classifier-Free Guidance (HM-CFG) to combine the base model's generalization with the subject-specific fidelity during sampling, improving prompt compliance while preserving subject details. Comprehensive experiments on CelebA-HQ, AFHQ-v2, and DreamBench show state-of-the-art results among tuning-free methods and substantial speedups over DreamBooth-style fine-tuning. Collectively, the approach offers a scalable, open-category personalization pathway with strong subject fidelity and controllable prompt adherence.

Abstract

Personalizing text-to-image diffusion models has traditionally relied on subject-specific fine-tuning approaches such as DreamBooth~\cite{ruiz2023dreambooth}, which are computationally expensive and slow at inference. Recent adapter- and encoder-based methods attempt to reduce this overhead but still depend on additional fine-tuning or large backbone models for satisfactory results. In this work, we revisit an orthogonal direction: fine-tuning-free personalization via Hypernetworks that predict LoRA-adapted weights directly from subject images. Prior hypernetwork-based approaches, however, suffer from costly data generation or unstable attempts to mimic base model optimization trajectories. We address these limitations with an end-to-end training objective, stabilized by a simple output regularization, yielding reliable and effective hypernetworks. Our method removes the need for per-subject optimization at test time while preserving both subject fidelity and prompt alignment. To further enhance compositional generalization at inference time, we introduce Hybrid-Model Classifier-Free Guidance (HM-CFG), which combines the compositional strengths of the base diffusion model with the subject fidelity of personalized models during sampling. Extensive experiments on CelebA-HQ, AFHQ-v2, and DreamBench demonstrate that our approach achieves strong personalization performance and highlights the promise of hypernetworks as a scalable and effective direction for open-category personalization.

Finetuning-Free Personalization of Text to Image Generation via Hypernetworks

TL;DR

Abstract

Finetuning-Free Personalization of Text to Image Generation via Hypernetworks

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)