Table of Contents
Fetching ...

LoFA: Learning to Predict Personalized Priors for Fast Adaptation of Visual Generative Models

Yiming Hao, Mutian Xu, Chongjie Ye, Jie Qin, Shunlin Lu, Yipeng Qin, Xiaoguang Han

TL;DR

<3-5 sentence high-level summary> LoFA tackles the inefficiency of personalizing visual generative models with LoRA by revealing structured LoRA response maps that capture how prompts influence parameter changes. It introduces a two-stage hypernetwork that first predicts a low-dimensional response map and then uses that guidance to predict full, uncompressed LoRA weights, enabling fast, seconds-level adaptation without sacrificing expressive capacity. The authors validate LoFA across three tasks—Personalized Human Action Video Generation, Text-to-Video Stylization, and Identity-Personalized Image Generation—showing competitive or superior quality to per-case LoRA with vastly reduced adaptation time. This work offers a practical pathway to real-time, user-centric personalization in visual generation, with broad implications for deployment of personalized diffusion-based systems.

Abstract

Personalizing visual generative models to meet specific user needs has gained increasing attention, yet current methods like Low-Rank Adaptation (LoRA) remain impractical due to their demand for task-specific data and lengthy optimization. While a few hypernetwork-based approaches attempt to predict adaptation weights directly, they struggle to map fine-grained user prompts to complex LoRA distributions, limiting their practical applicability. To bridge this gap, we propose LoFA, a general framework that efficiently predicts personalized priors for fast model adaptation. We first identify a key property of LoRA: structured distribution patterns emerge in the relative changes between LoRA and base model parameters. Building on this, we design a two-stage hypernetwork: first predicting relative distribution patterns that capture key adaptation regions, then using these to guide final LoRA weight prediction. Extensive experiments demonstrate that our method consistently predicts high-quality personalized priors within seconds, across multiple tasks and user prompts, even outperforming conventional LoRA that requires hours of processing. Project page: https://jaeger416.github.io/lofa/.

LoFA: Learning to Predict Personalized Priors for Fast Adaptation of Visual Generative Models

TL;DR

<3-5 sentence high-level summary> LoFA tackles the inefficiency of personalizing visual generative models with LoRA by revealing structured LoRA response maps that capture how prompts influence parameter changes. It introduces a two-stage hypernetwork that first predicts a low-dimensional response map and then uses that guidance to predict full, uncompressed LoRA weights, enabling fast, seconds-level adaptation without sacrificing expressive capacity. The authors validate LoFA across three tasks—Personalized Human Action Video Generation, Text-to-Video Stylization, and Identity-Personalized Image Generation—showing competitive or superior quality to per-case LoRA with vastly reduced adaptation time. This work offers a practical pathway to real-time, user-centric personalization in visual generation, with broad implications for deployment of personalized diffusion-based systems.

Abstract

Personalizing visual generative models to meet specific user needs has gained increasing attention, yet current methods like Low-Rank Adaptation (LoRA) remain impractical due to their demand for task-specific data and lengthy optimization. While a few hypernetwork-based approaches attempt to predict adaptation weights directly, they struggle to map fine-grained user prompts to complex LoRA distributions, limiting their practical applicability. To bridge this gap, we propose LoFA, a general framework that efficiently predicts personalized priors for fast model adaptation. We first identify a key property of LoRA: structured distribution patterns emerge in the relative changes between LoRA and base model parameters. Building on this, we design a two-stage hypernetwork: first predicting relative distribution patterns that capture key adaptation regions, then using these to guide final LoRA weight prediction. Extensive experiments demonstrate that our method consistently predicts high-quality personalized priors within seconds, across multiple tasks and user prompts, even outperforming conventional LoRA that requires hours of processing. Project page: https://jaeger416.github.io/lofa/.

Paper Structure

This paper contains 47 sections, 8 equations, 14 figures, 9 tables.

Figures (14)

  • Figure 1: We propose LoFA, a general framework that predicts personalized priors (i.e., LoRA weights) within seconds for fast adaptation of visual generative models. We evaluate its effectiveness across multiple personalization tasks: (a) Personalized Human Action Video Generation, (b) Text-to-Video Stylization, and (c) Identity-Personalized Image Generation. Across all tasks, our LoFA achieves comparable or superior generation quality compared to conventional LoRA fine-tuning—which typically requires hours of data collection and expert optimization. It shows the potential of our LoFA to benefit more practical applications.
  • Figure 2: Visualization of LoRA response maps. Each row corresponds to a distinct task-specific LoRA, while columns represent different network layers or blocks.
  • Figure 3: An overview of our LoFA. Conditioned on different user prompts, our network takes the base model weight $W$ as the input, and predicts LoRA response map ${\hat{R}}$ (\ref{['fig:response_map']}) at Stage-I. Next, Stage-II inherits Stage-I's architecture, and uses the learned information of the response map to guide the final prediction of the full LoRA weights.
  • Figure 4: Qualitative results on text-conditioned Personalized Human Action Video Generation.
  • Figure 5: Qualitative results on pose-conditioned Personalized Human Action Video Generation.
  • ...and 9 more figures