Table of Contents
Fetching ...

ExpressEdit: Fast Editing of Stylized Facial Expressions with Diffusion Models in Photoshop

Kenan Tang, Jiasheng Guo, Jeffrey Lin, Yao Qin

Abstract

Facial expressions of characters are a vital component of visual storytelling. While current AI image editing models hold promise for assisting artists in the task of stylized expression editing, these models introduce global noise and pixel drift into the edited image, preventing the integration of these models into professional image editing software and workflows. To bridge this gap, we introduce ExpressEdit, a fully open-source Photoshop plugin that is free from common artifacts of proprietary image editing models and robustly synergizes with native Photoshop operations such as Liquify. ExpressEdit seamlessly edits an expression within 3 seconds on a single consumer-grade GPU, significantly faster than popular proprietary models. Moreover, to support the generation of diverse expressions according to different narrative needs, we compile a comprehensive expression database of 135 expression tags enriched with example stories and images designed for retrieval-augmented generation. We open source the code and dataset to facilitate future research and artistic exploration.

ExpressEdit: Fast Editing of Stylized Facial Expressions with Diffusion Models in Photoshop

Abstract

Facial expressions of characters are a vital component of visual storytelling. While current AI image editing models hold promise for assisting artists in the task of stylized expression editing, these models introduce global noise and pixel drift into the edited image, preventing the integration of these models into professional image editing software and workflows. To bridge this gap, we introduce ExpressEdit, a fully open-source Photoshop plugin that is free from common artifacts of proprietary image editing models and robustly synergizes with native Photoshop operations such as Liquify. ExpressEdit seamlessly edits an expression within 3 seconds on a single consumer-grade GPU, significantly faster than popular proprietary models. Moreover, to support the generation of diverse expressions according to different narrative needs, we compile a comprehensive expression database of 135 expression tags enriched with example stories and images designed for retrieval-augmented generation. We open source the code and dataset to facilitate future research and artistic exploration.

Paper Structure

This paper contains 19 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: ExpressEdit can generate diverse, stylized expressions on an original image. The first row shows the original image, and the second and third rows show the edited images, with the user-specified expression above each image. ExpressEdit can handle both detailed multi-word descriptions (such as "clenched teeth") and emoticons (such as "@_@"), generating stylized depictions of expressions with ease.
  • Figure 2: ExpressEdit consists of two consecutive pipelines for a user-friendly yet professional editing experience. The prompt generation pipeline takes in a story paragraph and uses a VLM to retrieve relevant expression tags from a multi-modal expression database we curate. The relevant expression tags are inserted into a prompt, which is used in the image editing pipeline. The image editing pipeline starts by the user applying coarse transformations (such as Liquify) and casual selections on the original image, taking at most a few seconds of manual effort. Then, combined with the prompt, ExpressEdit robustly generates high-quality expressions based on the inputs.
  • Figure 3: Creative short stories are included in ExpressEdit to assist retrieval-augmented generation (RAG). The prompt template and an example story are shown here.
  • Figure 4: Baseline methods introduce destructive noise into the original image after each editing step, whereas the pixel changes from ExpressEdit are non-destructive, and the minor artifacts are easily repaired. Please see \ref{['sec:clean-edits']} for details.
  • Figure 5: The selection version of Nano Banana Pro in Photoshop and the naive inpainting both create visible artifacts around the selected region. Nano Banana Pro leaves artifacts around earlobes and the chin, making it impossible to contain the destructive noise via selecting. Naive inpainting also leaves artifacts on the right side of the neck and on the braid, necessitating the use of the SPICE backend.
  • ...and 5 more figures