Table of Contents
Fetching ...

Measurement-Constrained Sampling for Text-Prompted Blind Face Restoration

Wenjie Li, Yulun Zhang, Guangwei Gao, Heng Guo, Zhanyu Ma

TL;DR

This work tackles the one-to-many ambiguity in blind face restoration under extreme degradation by introducing Measurement-Constrained Sampling (MCS), a training-free framework that leverages text-guided diffusion with measurement constraints. MCS unifies forward measurements that anchor facial structure to input degradation and reverse measurements that expand the solution space toward diverse, prompt-aligned reconstructions, guided by a selection mechanism that progresses from structure to semantics. The approach demonstrates state-of-the-art performance on no-reference metrics and strong qualitative results on real-world degraded faces, including the ability to generate text-aligned outputs when prompts are provided. The method offers a flexible, controllable, and practical pathway for personalized BFR without requiring paired training data, with potential applications in forensic, entertainment, and AR contexts.

Abstract

Blind face restoration (BFR) may correspond to multiple plausible high-quality (HQ) reconstructions under extremely low-quality (LQ) inputs. However, existing methods typically produce deterministic results, struggling to capture this one-to-many nature. In this paper, we propose a Measurement-Constrained Sampling (MCS) approach that enables diverse LQ face reconstructions conditioned on different textual prompts. Specifically, we formulate BFR as a measurement-constrained generative task by constructing an inverse problem through controlled degradations of coarse restorations, which allows posterior-guided sampling within text-to-image diffusion. Measurement constraints include both Forward Measurement, which ensures results align with input structures, and Reverse Measurement, which produces projection spaces, ensuring that the solution can align with various prompts. Experiments show that our MCS can generate prompt-aligned results and outperforms existing BFR methods. Codes will be released after acceptance.

Measurement-Constrained Sampling for Text-Prompted Blind Face Restoration

TL;DR

This work tackles the one-to-many ambiguity in blind face restoration under extreme degradation by introducing Measurement-Constrained Sampling (MCS), a training-free framework that leverages text-guided diffusion with measurement constraints. MCS unifies forward measurements that anchor facial structure to input degradation and reverse measurements that expand the solution space toward diverse, prompt-aligned reconstructions, guided by a selection mechanism that progresses from structure to semantics. The approach demonstrates state-of-the-art performance on no-reference metrics and strong qualitative results on real-world degraded faces, including the ability to generate text-aligned outputs when prompts are provided. The method offers a flexible, controllable, and practical pathway for personalized BFR without requiring paired training data, with potential applications in forensic, entertainment, and AR contexts.

Abstract

Blind face restoration (BFR) may correspond to multiple plausible high-quality (HQ) reconstructions under extremely low-quality (LQ) inputs. However, existing methods typically produce deterministic results, struggling to capture this one-to-many nature. In this paper, we propose a Measurement-Constrained Sampling (MCS) approach that enables diverse LQ face reconstructions conditioned on different textual prompts. Specifically, we formulate BFR as a measurement-constrained generative task by constructing an inverse problem through controlled degradations of coarse restorations, which allows posterior-guided sampling within text-to-image diffusion. Measurement constraints include both Forward Measurement, which ensures results align with input structures, and Reverse Measurement, which produces projection spaces, ensuring that the solution can align with various prompts. Experiments show that our MCS can generate prompt-aligned results and outperforms existing BFR methods. Codes will be released after acceptance.

Paper Structure

This paper contains 24 sections, 16 equations, 9 figures, 5 tables, 1 algorithm.

Figures (9)

  • Figure 1: Illustration of the one-to-many nature of blind degradations and the effectiveness of our method in generating diverse, prompt-aligned face images from extremely low-quality inputs.
  • Figure 2: (Left) While real and constructed degradations differ, the gap diminishes in reverse diffusion, making constructed observations suitable anchors of inverse problem solving for guided sampling. (Right) When facial structures in noisy samples are unclear in the reverse diffusion process of T2I diffusion, inverse problem solving centred on constructed degradations amplifies noise, resulting in artifacts.
  • Figure 3: Our method utilizes Forward Measurement to align solutions with input structures, where structural components after the wavelet decomposition define the structure. Reverse Measurement ensures solutions lie within the degenerate space centered on anchor points. Therefore, under the guidance of texts, our method can generate diverse facial results that satisfy both degeneracy and textual constraints.
  • Figure 4: We perform a statistical analysis of the pre-trained T2I saharia2022photorealistic model's convergence during reverse diffusion on CelebA-HQ test set. Image variance, structural gradients, and selected visualizations all show that the T2I model first reconstructs structural features in a small portion of facial regions, then refines the detailed features covering the majority of facial regions.
  • Figure 5: Quantitative results under mild degradation without textual prompts. Our method can achieve higher BFR quality than the latest learning-based wang2023dr2lin2024diffbir and sampling-based li2025self methods.
  • ...and 4 more figures