Table of Contents
Fetching ...

PersonificationNet: Making customized subject act like a person

Tianchu Guo, Pengyu Li, Biao Wang, Xiansheng Hua

TL;DR

This work tackles the challenge of rendering a customized subject in the exact pose and background of a reference image. It introduces PersonificationNet, a three-component framework comprising a Customized Branch for appearance capture, a finetuned Pose Condition Branch for pose transfer, and a Structure Alignment Module to reconcile body proportions with the reference pose during inference. The customized branch is trained on 3–5 user-provided images with a rare-token identity, the pose branch is finetuned on 55 subject-specific images, and the structure alignment step ensures the subject's proportions match while adopting the reference pose; together they outperform Dreambooth and Dreambooth+ControlNet on two target subjects. The approach enables faithful pose and background transfer for customized subjects, enabling more controllable and personalized diffusion-based image synthesis in practical applications.

Abstract

Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a PersonificationNet, which can control the specified subject such as a cartoon character or plush toy to act the same pose as a given referenced person's image. It contains a customized branch, a pose condition branch and a structure alignment module. Specifically, first, the customized branch mimics specified subject appearance. Second, the pose condition branch transfers the body structure information from the human to variant instances. Last, the structure alignment module bridges the structure gap between human and specified subject in the inference stage. Experimental results show our proposed PersonificationNet outperforms the state-of-the-art methods.

PersonificationNet: Making customized subject act like a person

TL;DR

This work tackles the challenge of rendering a customized subject in the exact pose and background of a reference image. It introduces PersonificationNet, a three-component framework comprising a Customized Branch for appearance capture, a finetuned Pose Condition Branch for pose transfer, and a Structure Alignment Module to reconcile body proportions with the reference pose during inference. The customized branch is trained on 3–5 user-provided images with a rare-token identity, the pose branch is finetuned on 55 subject-specific images, and the structure alignment step ensures the subject's proportions match while adopting the reference pose; together they outperform Dreambooth and Dreambooth+ControlNet on two target subjects. The approach enables faithful pose and background transfer for customized subjects, enabling more controllable and personalized diffusion-based image synthesis in practical applications.

Abstract

Recently customized generation has significant potential, which uses as few as 3-5 user-provided images to train a model to synthesize new images of a specified subject. Though subsequent applications enhance the flexibility and diversity of customized generation, fine-grained control over the given subject acting like the person's pose is still lack of study. In this paper, we propose a PersonificationNet, which can control the specified subject such as a cartoon character or plush toy to act the same pose as a given referenced person's image. It contains a customized branch, a pose condition branch and a structure alignment module. Specifically, first, the customized branch mimics specified subject appearance. Second, the pose condition branch transfers the body structure information from the human to variant instances. Last, the structure alignment module bridges the structure gap between human and specified subject in the inference stage. Experimental results show our proposed PersonificationNet outperforms the state-of-the-art methods.
Paper Structure (12 sections, 4 equations, 5 figures)

This paper contains 12 sections, 4 equations, 5 figures.

Figures (5)

  • Figure 1: Problem definition and challenge. The left part shows given the specified Mr. Potato images, we try to let Mr. Potato replicate the actions of the referenced image. The right part shows that existing method struggle to achieve this goal but our proposed method succeeds to generate.
  • Figure 2: Pipeline of the PersonificationNet.
  • Figure 3: Comparison with existing methods.
  • Figure 4: Results of ablation study.
  • Figure 5: Variant applications of the PersonificationNet.