Table of Contents
Fetching ...

FastFace: Tuning Identity Preservation in Distilled Diffusion via Guidance and Attention

Sergey Karpukhin, Vadim Titov, Andrey Kuznetsov, Aibek Alanov

TL;DR

FastFace addresses the challenge of adapting pretrained id-preserving adapters to distilled diffusion models without retraining, enabling real-time, few-step generation. It introduces two complementary components—Decoupled Classifier-Free Guidance (DCG) and Attention Manipulation (AM)—and integrates them into a universal training-free framework that handles stylistic and realistic generation scenarios. DCG mathematically decouples identity conditioning from text conditioning and employs scheduling and rescaling to stabilize few-step inference; AM focuses attention maps to facial regions using scale-power and scheduled-softmask transforms, improving identity fidelity with minimal artifacts. The authors provide a disentangled evaluation dataset and demonstrate consistent improvements in identity preservation, prompt alignment, and image quality across multiple distilled checkpoints, highlighting FastFace’s practical impact for real-time, personalized diffusion-based generation.

Abstract

In latest years plethora of identity-preserving adapters for a personalized generation with diffusion models have been released. Their main disadvantage is that they are dominantly trained jointly with base diffusion models, which suffer from slow multi-step inference. This work aims to tackle the challenge of training-free adaptation of pretrained ID-adapters to diffusion models accelerated via distillation - through careful re-design of classifier-free guidance for few-step stylistic generation and attention manipulation mechanisms in decoupled blocks to improve identity similarity and fidelity, we propose universal FastFace framework. Additionally, we develop a disentangled public evaluation protocol for id-preserving adapters.

FastFace: Tuning Identity Preservation in Distilled Diffusion via Guidance and Attention

TL;DR

FastFace addresses the challenge of adapting pretrained id-preserving adapters to distilled diffusion models without retraining, enabling real-time, few-step generation. It introduces two complementary components—Decoupled Classifier-Free Guidance (DCG) and Attention Manipulation (AM)—and integrates them into a universal training-free framework that handles stylistic and realistic generation scenarios. DCG mathematically decouples identity conditioning from text conditioning and employs scheduling and rescaling to stabilize few-step inference; AM focuses attention maps to facial regions using scale-power and scheduled-softmask transforms, improving identity fidelity with minimal artifacts. The authors provide a disentangled evaluation dataset and demonstrate consistent improvements in identity preservation, prompt alignment, and image quality across multiple distilled checkpoints, highlighting FastFace’s practical impact for real-time, personalized diffusion-based generation.

Abstract

In latest years plethora of identity-preserving adapters for a personalized generation with diffusion models have been released. Their main disadvantage is that they are dominantly trained jointly with base diffusion models, which suffer from slow multi-step inference. This work aims to tackle the challenge of training-free adaptation of pretrained ID-adapters to diffusion models accelerated via distillation - through careful re-design of classifier-free guidance for few-step stylistic generation and attention manipulation mechanisms in decoupled blocks to improve identity similarity and fidelity, we propose universal FastFace framework. Additionally, we develop a disentangled public evaluation protocol for id-preserving adapters.

Paper Structure

This paper contains 47 sections, 14 equations, 28 figures, 3 tables.

Figures (28)

  • Figure 1: FastFace method framework: on the left - high-level idea of pipeline, enabling few-sep id-preserving generation, on the right - effect of FastFace components on realistic and stylistic generations
  • Figure 2: Different cases of user intention during ID-preserving generation: (a) - stylistic, (b) - realistic
  • Figure 3: Scheduling effect on DCG, from right to left - baseline generation, single step alterations of $\alpha$ and $\beta$ coefficients to high value. In first steps image is completely corrupted, while last step introduces local visual artifacts
  • Figure 4: Visual result of applying DCG to stylistic generation with various models
  • Figure 5: Visualization of attention maps timesteps 749 and 499 in decoupled block of SDXL in relation to generation output, specifically up_blocks.0.attentions.2.transformer_blocks.6
  • ...and 23 more figures