Table of Contents
Fetching ...

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

Pengchong Qiao, Lei Shang, Chang Liu, Baigui Sun, Xiangyang Ji, Jie Chen

TL;DR

FaceChain-SuDe addresses the attribute-missing problem in one-shot subject-driven generation by modeling a subject as a derived class of its semantic category and introducing Subject-Derivation Regularization (SuDe). SuDe couples a private-attribute reconstruction loss with a category-inheritance loss that leverages the implicit diffusion classifier to encourage generated images to semantically belong to the subject’s category, while preserving subject fidelity. The method is plug-and-play and improves attribute alignment (BLIP-T) across DreamBooth, Custom Diffusion, and ViCo on multiple SD backbones, with stability ensured by a loss-truncation strategy. This approach broadens practical one-shot personalization by enabling more imaginative attribute-related generations without sacrificing subject identity.

Abstract

Subject-driven generation has garnered significant interest recently due to its ability to personalize text-to-image generation. Typical works focus on learning the new subject's private attributes. However, an important fact has not been taken seriously that a subject is not an isolated new concept but should be a specialization of a certain category in the pre-trained model. This results in the subject failing to comprehensively inherit the attributes in its category, causing poor attribute-related generations. In this paper, motivated by object-oriented programming, we model the subject as a derived class whose base class is its semantic category. This modeling enables the subject to inherit public attributes from its category while learning its private attributes from the user-provided example. Specifically, we propose a plug-and-play method, Subject-Derived regularization (SuDe). It constructs the base-derived class modeling by constraining the subject-driven generated images to semantically belong to the subject's category. Extensive experiments under three baselines and two backbones on various subjects show that our SuDe enables imaginative attribute-related generations while maintaining subject fidelity. Codes will be open sourced soon at FaceChain (https://github.com/modelscope/facechain).

FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation

TL;DR

FaceChain-SuDe addresses the attribute-missing problem in one-shot subject-driven generation by modeling a subject as a derived class of its semantic category and introducing Subject-Derivation Regularization (SuDe). SuDe couples a private-attribute reconstruction loss with a category-inheritance loss that leverages the implicit diffusion classifier to encourage generated images to semantically belong to the subject’s category, while preserving subject fidelity. The method is plug-and-play and improves attribute alignment (BLIP-T) across DreamBooth, Custom Diffusion, and ViCo on multiple SD backbones, with stability ensured by a loss-truncation strategy. This approach broadens practical one-shot personalization by enabling more imaginative attribute-related generations without sacrificing subject identity.

Abstract

Subject-driven generation has garnered significant interest recently due to its ability to personalize text-to-image generation. Typical works focus on learning the new subject's private attributes. However, an important fact has not been taken seriously that a subject is not an isolated new concept but should be a specialization of a certain category in the pre-trained model. This results in the subject failing to comprehensively inherit the attributes in its category, causing poor attribute-related generations. In this paper, motivated by object-oriented programming, we model the subject as a derived class whose base class is its semantic category. This modeling enables the subject to inherit public attributes from its category while learning its private attributes from the user-provided example. Specifically, we propose a plug-and-play method, Subject-Derived regularization (SuDe). It constructs the base-derived class modeling by constraining the subject-driven generated images to semantically belong to the subject's category. Extensive experiments under three baselines and two backbones on various subjects show that our SuDe enables imaginative attribute-related generations while maintaining subject fidelity. Codes will be open sourced soon at FaceChain (https://github.com/modelscope/facechain).
Paper Structure (37 sections, 14 equations, 14 figures, 4 tables)

This paper contains 37 sections, 14 equations, 14 figures, 4 tables.

Figures (14)

  • Figure 1: (a) The subject is a golden retriever 'Spike', and the baseline is DreamBooth ruiz2023dreambooth. The baseline's failure is because the example image cannot provide the needed attributes like 'running'. Our method tackles it by inheriting these attributes from the 'Dog' category to 'Spike'. (b) We build 'Spike' as a derived class of the base class 'Dog'. In this paper, we record the general properties of the base class from the pre-trained model as public attributes, while subject-specific properties as private attributes. The part marked with a red wavy line is the 'Inherit' syntax in C++ stroustrup1986overview.
  • Figure 2: The pipeline of SuDe. (a) Learn private attributes by reconstructing the subject example with the $\mathcal{L}_{sub}$ in Eq. \ref{['eq: subject loss']}. (b) Inherit public attributes by constraining the subject-driven $\bm{x}_{t-1}$ semantically belongs to its category (e.g., dog), with the $\mathcal{L}_{sude}$ in Eq. \ref{['eq: sude loss']}.
  • Figure 3: (a), (b), and (c) are generated images using DreamBooth ruiz2023dreambooth, Custom Diffusion kumari2023multi, and ViCo hao2023vico as the baselines, respectively. Results are obtained using the DDIM song2020denoising sampler with 100 steps. In prompts, we mark the subject token in orange and attributes in red.
  • Figure 4: Visual comparisons by using different values of $w_s$. Results are from DreamBooth w/ SuDe, where the default $w_s$ is 0.4.
  • Figure 5: Loss truncation. SuDe-generations with and without truncation using Custom Diffusion as the baseline.
  • ...and 9 more figures