Table of Contents
Fetching ...

InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention

Howard Zhang, Yuval Alaluf, Sizhuo Ma, Achuta Kadambi, Jian Wang, Kfir Aberman

TL;DR

InstantRestore addresses the challenge of identity-preserving face restoration under severe degradation with a fast, single-pass approach. It leverages a shared-image attention mechanism that transfers identity information from a small set of reference images directly into a one-step diffusion-based generator, augmented by a landmark attention supervision loss. The method employs AdaIN normalization and LoRA-adapted Stable Diffusion Turbo, trained with image-based losses, ArcFace identity guidance, and a DINO-v2 adversarial loss, achieving near real-time performance while preserving identity across unseen subjects. Experimental results show competitive image fidelity and significantly improved identity preservation, with scalable performance and robustness to real-world degradations, making it suitable for large-scale deployment.

Abstract

Face image restoration aims to enhance degraded facial images while addressing challenges such as diverse degradation types, real-time processing demands, and, most crucially, the preservation of identity-specific features. Existing methods often struggle with slow processing times and suboptimal restoration, especially under severe degradation, failing to accurately reconstruct finer-level identity details. To address these issues, we introduce InstantRestore, a novel framework that leverages a single-step image diffusion model and an attention-sharing mechanism for fast and personalized face restoration. Additionally, InstantRestore incorporates a novel landmark attention loss, aligning key facial landmarks to refine the attention maps, enhancing identity preservation. At inference time, given a degraded input and a small (~4) set of reference images, InstantRestore performs a single forward pass through the network to achieve near real-time performance. Unlike prior approaches that rely on full diffusion processes or per-identity model tuning, InstantRestore offers a scalable solution suitable for large-scale applications. Extensive experiments demonstrate that InstantRestore outperforms existing methods in quality and speed, making it an appealing choice for identity-preserving face restoration.

InstantRestore: Single-Step Personalized Face Restoration with Shared-Image Attention

TL;DR

InstantRestore addresses the challenge of identity-preserving face restoration under severe degradation with a fast, single-pass approach. It leverages a shared-image attention mechanism that transfers identity information from a small set of reference images directly into a one-step diffusion-based generator, augmented by a landmark attention supervision loss. The method employs AdaIN normalization and LoRA-adapted Stable Diffusion Turbo, trained with image-based losses, ArcFace identity guidance, and a DINO-v2 adversarial loss, achieving near real-time performance while preserving identity across unseen subjects. Experimental results show competitive image fidelity and significantly improved identity preservation, with scalable performance and robustness to real-world degradations, making it suitable for large-scale deployment.

Abstract

Face image restoration aims to enhance degraded facial images while addressing challenges such as diverse degradation types, real-time processing demands, and, most crucially, the preservation of identity-specific features. Existing methods often struggle with slow processing times and suboptimal restoration, especially under severe degradation, failing to accurately reconstruct finer-level identity details. To address these issues, we introduce InstantRestore, a novel framework that leverages a single-step image diffusion model and an attention-sharing mechanism for fast and personalized face restoration. Additionally, InstantRestore incorporates a novel landmark attention loss, aligning key facial landmarks to refine the attention maps, enhancing identity preservation. At inference time, given a degraded input and a small (~4) set of reference images, InstantRestore performs a single forward pass through the network to achieve near real-time performance. Unlike prior approaches that rely on full diffusion processes or per-identity model tuning, InstantRestore offers a scalable solution suitable for large-scale applications. Extensive experiments demonstrate that InstantRestore outperforms existing methods in quality and speed, making it an appealing choice for identity-preserving face restoration.

Paper Structure

This paper contains 45 sections, 11 equations, 18 figures, 6 tables.

Figures (18)

  • Figure 1: Overview of InstantRestore. Given a pretrained single-step diffusion model $G$ (shown in blue), we fine-tune it to map a degraded input image $\mathbf{I}_{low}$ to a high-quality restored output $\mathbf{I}_{rest}$ in a single forward pass. Our restoration model is trained using a combination of perceptual (LPIPS), identity (ID), and MSSIM losses, along with an adversarial loss from a DINO-v2-based discriminator $D$. To integrate identity-specific features from a small set of reference images, we use a frozen copy of the diffusion model, $\mathcal{G}_{ref}$, to extract keys and values from the references. These keys and values replace those of the generated image within the UNet decoder, injecting identity-related information into the restoration process. During inference, a single feed-forward is performed, resulting in a runtime of ${\sim}0.5$ seconds.
  • Figure 2: Modified Extended Self-Attention Block. Given a query $Q_{rest}$ extracted from the degraded input, we reconstruct identity-specific features by combining the keys $K_r$ from the reference images, weighted by their relevance to the query (as shown on top). The bottom block shows our modified self-attention block, where values $V_r$ from the reference images are aligned with those of $\mathbf{I}_{low}$ using AdaIN huang2017arbitrary. These aligned values are then used to transfer identity-related information, weighted by relevance score.
  • Figure 3: Attention visualization. For a given query, indicated by the red dot on the left, we illustrate the ideal attention maps used in our LAS loss (top) alongside the attention maps obtained from our extended self-attention across all reference images (bottom).
  • Figure 4: Qualitative Comparison on Synthetic Degradations. Existing restoration techniques often struggle to retain identity-specific details, such as eye color (first two rows) or facial hair (last two rows). In contrast, InstantRestore successfully restores these features with similar or better runtime. Sample references of the target identity are provided to the left, with additional results in \ref{['sec:additional_qualitative']}.
  • Figure 5: Qualitative Comparison to Dual-Pivot Tuning chari2023personalized. We achieve comparable visual quality and identity preservation compared to Dual-Pivot Tuning, without requiring per-identity tuning while running in an order of magnitude less time.
  • ...and 13 more figures