Table of Contents
Fetching ...

TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation

Chengcheng Feng, Mu He, Qiuyu Tian, Haojie Yin, Xiaofang Zhao, Hongwei Tang, Xingqiang Wei

TL;DR

This work proposes an innovative method that integrates Singular Value Decomposition (SVD) into the Low-Rank Adaptation (LoRA) parameter update strategy, aimed at enhancing the fine-tuning efficiency and output quality of image generation models.

Abstract

As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process. In response to these challenges, we propose an innovative method that integrates Singular Value Decomposition (SVD) into the Low-Rank Adaptation (LoRA) parameter update strategy, aimed at enhancing the fine-tuning efficiency and output quality of image generation models. By incorporating SVD within the LoRA framework, our method not only effectively reduces the risk of overfitting but also enhances the stability of model outputs, and captures subtle, creator-desired feature adjustments more accurately. We evaluated our method on multiple datasets, and the results show that, compared to traditional fine-tuning methods, our approach significantly improves the model's generalization ability and creative flexibility while maintaining the quality of generation. Moreover, this method maintains LoRA's excellent performance under resource-constrained conditions, allowing for significant improvements in image generation quality without sacrificing the original efficiency and resource advantages.

TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation

TL;DR

This work proposes an innovative method that integrates Singular Value Decomposition (SVD) into the Low-Rank Adaptation (LoRA) parameter update strategy, aimed at enhancing the fine-tuning efficiency and output quality of image generation models.

Abstract

As deep learning technology continues to advance, image generation models, especially models like Stable Diffusion, are finding increasingly widespread application in visual arts creation. However, these models often face challenges such as overfitting, lack of stability in generated results, and difficulties in accurately capturing the features desired by creators during the fine-tuning process. In response to these challenges, we propose an innovative method that integrates Singular Value Decomposition (SVD) into the Low-Rank Adaptation (LoRA) parameter update strategy, aimed at enhancing the fine-tuning efficiency and output quality of image generation models. By incorporating SVD within the LoRA framework, our method not only effectively reduces the risk of overfitting but also enhances the stability of model outputs, and captures subtle, creator-desired feature adjustments more accurately. We evaluated our method on multiple datasets, and the results show that, compared to traditional fine-tuning methods, our approach significantly improves the model's generalization ability and creative flexibility while maintaining the quality of generation. Moreover, this method maintains LoRA's excellent performance under resource-constrained conditions, allowing for significant improvements in image generation quality without sacrificing the original efficiency and resource advantages.
Paper Structure (16 sections, 5 equations, 4 figures, 2 tables)

This paper contains 16 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Structural Design of the TriLoRA Model.
  • Figure 2: Images generated by LoRA and TriLoRA after training for 10 epochs on the GAC dataset, with responses to three prompts (each column).
  • Figure 3: Images generated by the LoRA and TriLoRA models on the Pokemonpinkney2022pokemon dataset after 100 training epochs, corresponding to distinct prompts across columns.
  • Figure 4: Comparative Synthesis by LoRA and TriLoRA After 500 Training Epochs on Scarlett JohanssonkaggleCelebrityFace Dataset with epiCRealismhuggingfaceEmilianJRepiCRealismHugging Pre-trained Model. Prompt: "Neutral expression, evening makeup, light brown choppy bob hairstyle, stud earrings, burgundy lace dress, event background." Note the significant presence of bright artifacts in the image produced by LoRA.