Table of Contents
Fetching ...

Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration

Amirhossein Kazerouni, Maitreya Suin, Tristan Aumentado-Armstrong, Sina Honari, Amanpreet Walia, Iqbal Mohomed, Konstantinos G. Derpanis, Babak Taati, Alex Levinshtein

Abstract

Recent advances in image restoration have enabled high-fidelity recovery of faces from degraded inputs using reference-based face restoration models (Ref-FR). However, such methods focus solely on facial regions, neglecting degradation across the full scene, including body and background, which limits practical usability. Meanwhile, full-scene restorers often ignore degradation cues entirely, leading to underdetermined predictions and visual artifacts. In this work, we propose Face2Scene, a two-stage restoration framework that leverages the face as a perceptual oracle to estimate degradation and guide the restoration of the entire image. Given a degraded image and one or more identity references, we first apply a Ref-FR model to reconstruct high-quality facial details. From the restored-degraded face pair, we extract a face-derived degradation code that captures degradation attributes (e.g., noise, blur, compression), which is then transformed into multi-scale degradation-aware tokens. These tokens condition a diffusion model to restore the full scene in a single step, including the body and background. Extensive experiments demonstrate the superior effectiveness of the proposed method compared to state-of-the-art methods.

Face2Scene: Using Facial Degradation as an Oracle for Diffusion-Based Scene Restoration

Abstract

Recent advances in image restoration have enabled high-fidelity recovery of faces from degraded inputs using reference-based face restoration models (Ref-FR). However, such methods focus solely on facial regions, neglecting degradation across the full scene, including body and background, which limits practical usability. Meanwhile, full-scene restorers often ignore degradation cues entirely, leading to underdetermined predictions and visual artifacts. In this work, we propose Face2Scene, a two-stage restoration framework that leverages the face as a perceptual oracle to estimate degradation and guide the restoration of the entire image. Given a degraded image and one or more identity references, we first apply a Ref-FR model to reconstruct high-quality facial details. From the restored-degraded face pair, we extract a face-derived degradation code that captures degradation attributes (e.g., noise, blur, compression), which is then transformed into multi-scale degradation-aware tokens. These tokens condition a diffusion model to restore the full scene in a single step, including the body and background. Extensive experiments demonstrate the superior effectiveness of the proposed method compared to state-of-the-art methods.
Paper Structure (36 sections, 7 equations, 21 figures, 12 tables)

This paper contains 36 sections, 7 equations, 21 figures, 12 tables.

Figures (21)

  • Figure 1: Overview of the Face2Scene pipeline. In stage I, we leverage a set of reference faces to restore the LQ face crop. In stage II, we use the pair of LQ and HQ faces to extract a guided degradation with FaDeX and inject it into a one-step diffusion model using MapNet. The diffusion model then reconstructs the full-scene image.
  • Figure 2: Visual comparison of Face2Scene with the three top-performing methods from the quantitative results (zoom in to see details).
  • Figure 3: Cosine similarity analysis. We show the cosine similarity across embeddings of image pairs with different degradations. (Left) similarities per degradation type (averaged over images). (Right) similarities per image, averaged over degradation types. Shaded area shows standard deviation. This confirms FaDeX isolates degradation from image content.
  • Figure 4: Scene diversity visualization. Word-cloud representation of the semantic tags extracted from our synthetic dataset using the RAM++ zhang2024recognize image-tagging model. Larger words indicate tags that occur more frequently across generated images, highlighting the broad diversity of scenes captured in our dataset.
  • Figure 5: Four representative images of our synthetic InScene dataset. Each sample is shown alongside its structured prompt. We also show the stored metadata, including facial landmarks, bounding boxes, and associated identity labels.
  • ...and 16 more figures