Table of Contents
Fetching ...

DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning

Yuxuan Duan, Yan Hong, Bo Zhang, Jun Lan, Huijia Zhu, Weiqiang Wang, Jianfu Zhang, Li Niu, Liqing Zhang

Abstract

The recent progress in text-to-image models pretrained on large-scale datasets has enabled us to generate various images as long as we provide a text prompt describing what we want. Nevertheless, the availability of these models is still limited when we expect to generate images that fall into a specific domain either hard to describe or just unseen to the models. In this work, we propose DomainGallery, a few-shot domain-driven image generation method which aims at finetuning pretrained Stable Diffusion on few-shot target datasets in an attribute-centric manner. Specifically, DomainGallery features prior attribute erasure, attribute disentanglement, regularization and enhancement. These techniques are tailored to few-shot domain-driven generation in order to solve key issues that previous works have failed to settle. Extensive experiments are given to validate the superior performance of DomainGallery on a variety of domain-driven generation scenarios. Codes are available at https://github.com/Ldhlwh/DomainGallery.

DomainGallery: Few-shot Domain-driven Image Generation by Attribute-centric Finetuning

Abstract

The recent progress in text-to-image models pretrained on large-scale datasets has enabled us to generate various images as long as we provide a text prompt describing what we want. Nevertheless, the availability of these models is still limited when we expect to generate images that fall into a specific domain either hard to describe or just unseen to the models. In this work, we propose DomainGallery, a few-shot domain-driven image generation method which aims at finetuning pretrained Stable Diffusion on few-shot target datasets in an attribute-centric manner. Specifically, DomainGallery features prior attribute erasure, attribute disentanglement, regularization and enhancement. These techniques are tailored to few-shot domain-driven generation in order to solve key issues that previous works have failed to settle. Extensive experiments are given to validate the superior performance of DomainGallery on a variety of domain-driven generation scenarios. Codes are available at https://github.com/Ldhlwh/DomainGallery.

Paper Structure

This paper contains 47 sections, 7 equations, 11 figures, 1 table.

Figures (11)

  • Figure 1: Given a few-shot target dataset of a specific domain such as sketches painted by an artist (a), it is usually difficult to directly generate images of this domain using pretrained text-to-image models (b). By using DomainGallery we propose in this work, we can achieve domain-driven generation in intra-category (c); cross-category (d); extra attribute (e); and personalization (f) scenarios.
  • Figure 2: An overview of DomainGallery. (a) Before finetuning, we erase the prior attributes of the identifier [V] by matching the predicted noises when using source/target text conditions via $L_{\mathrm{erase}}$. (b) During fintuning, besides training ordinarily on target datasets (top-left), we additionally impose domain-category attribute disentanglement loss $L_{\mathrm{disen}}$ (bottom-left) and transfer-based similarity consistency loss $L_{\mathrm{sim}}$ (right). (c) When generating cross-category images, we enhance the domain attributes referred by [V] in a CFG-like manner. Dashed arrows indicate gradient stopping.
  • Figure 3: The 10-shot CUFS sketches dataset (left) and the intra-category samples generated by the baselines and DomainGallery with prompt "a [V] face" (right).
  • Figure 4: The 10-shot datasets (left) and the cross-category samples generated by the baselines and DomainGallery (right), on Van Gogh houses (top) and watercolor dogs (bottom).
  • Figure 5: Intra-category (top row) and cross-category (middle row) samples with extra attributes given by texts generated by DomainGallery, on CUFS sketches. The bottom row additionally show the case where the text contains conflicting attributes.
  • ...and 6 more figures