LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing

Aoyang Liu; Qingnan Fan; Shuai Qin; Hong Gu; Yansong Tang

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing

Aoyang Liu, Qingnan Fan, Shuai Qin, Hong Gu, Yansong Tang

TL;DR

This work tackles the challenge of non-rigid image editing while preserving subject identity by learning a personalized identity prior from only a few reference images. It introduces a two-stage LIPE framework: (1) data-augmented learning of a subject-specific prior by fine-tuning a diffusion model on attention—updates limited to the attention layers, and (2) a non-rigid editing mechanism called NIMA that uses identity-aware cross-attention masks to guide latent blending during denoising. The authors also present LIPE, a dedicated dataset spanning objects, animals, and humans, and demonstrate through qualitative and quantitative evaluations that LIPE outperforms strong baselines in identity preservation, background fidelity, and prompt alignment for non-rigid edits. The approach offers a practical path toward controllable, identity-consistent image editing with minimal target subject data, supported by a dataset and comprehensive analyses.

Abstract

Although recent years have witnessed significant advancements in image editing thanks to the remarkable progress of text-to-image diffusion models, the problem of non-rigid image editing still presents its complexities and challenges. Existing methods often fail to achieve consistent results due to the absence of unique identity characteristics. Thus, learning a personalized identity prior might help with consistency in the edited results. In this paper, we explore a novel task: learning the personalized identity prior for text-based non-rigid image editing. To address the problems in jointly learning prior and editing the image, we present LIPE, a two-stage framework designed to customize the generative model utilizing a limited set of images of the same subject, and subsequently employ the model with learned prior for non-rigid image editing. Experimental results demonstrate the advantages of our approach in various editing scenarios over past related leading methods in qualitative and quantitative ways.

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing

TL;DR

Abstract

Paper Structure (40 sections, 11 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 40 sections, 11 equations, 11 figures, 5 tables, 1 algorithm.

Introduction
Method
Personalized Identity Prior
Non-rigid Image Editing via Identity-aware Mask Blend
Experiments
Dataset
Comparisons with Previous Works
Qualitative Results on General Objects
Qualitative Results on Human Faces
User study
Quantitative Evaluation
Ablation Study
Conclusion
Appendix / supplemental material
Related Work
...and 25 more sections

Figures (11)

Figure 1: Given a few reference images of the same identity, our framework learns a personalized identity prior and applies diverse non-rigid image editing for a test image guided by a textual description, leading to high identity-preserved edited results.
Figure 2: The pipeline for data augmentation in learning personalized identity prior. (a) We make detailed editing-oriented captions for reference images by harnessing the large language and vision assistant. (b) We leverage the GPT-4 and pre-trained T2I model to generate diverse editing-oriented text-image pairs for the subject's class, which serves as the regularization dataset.
Figure 3: Illustration of Non-rigid Image editing via identity-aware MAsk blend (NIMA). (a) Given a test image, we first invert it to obtain the inverted latents $\{x_i\}$ for image reconstruction, to further obtain the subject mask $M^s$ for the source image. (b) Afterward, to achieve non-rigid image editing, we generate the target image by blending the source $x_t$ and target $\hat{x}_T$ information with the generated masks ($M^s$, $M_t^e$).
Figure 4: Identity-aware attention map.
Figure 5: Comparisons with previous work on general objects. The red font highlights the editing directions. Left to right: Reference images, Test image, Imagic kawar2023imagic, MasaCtrl cao2023masactrl, DreamCtrl, and Our method.
...and 6 more figures

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing

TL;DR

Abstract

LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing

Authors

TL;DR

Abstract

Table of Contents

Figures (11)