Table of Contents
Fetching ...

Test-Time Preference Optimization for Image Restoration

Bingchen Li, Xin Li, Jiaqi Xu, Jiaming Guo, Wenbo Li, Renjing Pei, Zhibo Chen

TL;DR

This work tackles misalignment between typical IR outputs and human preferences by proposing Test-Time Preference Optimization (TTPO), a training-free, three-stage pipeline that generates on-the-fly preference data via diffusion inversion, selects preferred/dispreferred samples with NR-IQA fusion, and guides conditioned diffusion denoising using reward signals to sharpen perceptual quality while preserving structure. By operating on any IR backbone, TTPO demonstrates broad applicability across denoising, super-resolution, deraining, and low-light enhancement, and integrates with zero-shot diffusion-based IR methods. Key innovations include frequency-decomposed guidance to separate texture from structure, a three-stage denoising schedule, and model-agnostic diffusion-based editing to produce restored images that align with human preferences without retraining. Overall, TTPO offers a practical path to perceptually superior IR results, validated by extensive quantitative and user studies, though it contends with computational cost and gaps in NR-IQA freshness to human judgments. The approach has potential impact on deploying restoration systems with improved user satisfaction in real-world imaging pipelines.

Abstract

Image restoration (IR) models are typically trained to recover high-quality images using L1 or LPIPS loss. To handle diverse unknown degradations, zero-shot IR methods have also been introduced. However, existing pre-trained and zero-shot IR approaches often fail to align with human preferences, resulting in restored images that may not be favored. This highlights the critical need to enhance restoration quality and adapt flexibly to various image restoration tasks or backbones without requiring model retraining and ideally without labor-intensive preference data collection. In this paper, we propose the first Test-Time Preference Optimization (TTPO) paradigm for image restoration, which enhances perceptual quality, generates preference data on-the-fly, and is compatible with any IR model backbone. Specifically, we design a training-free, three-stage pipeline: (i) generate candidate preference images online using diffusion inversion and denoising based on the initially restored image; (ii) select preferred and dispreferred images using automated preference-aligned metrics or human feedback; and (iii) use the selected preference images as reward signals to guide the diffusion denoising process, optimizing the restored image to better align with human preferences. Extensive experiments across various image restoration tasks and models demonstrate the effectiveness and flexibility of the proposed pipeline.

Test-Time Preference Optimization for Image Restoration

TL;DR

This work tackles misalignment between typical IR outputs and human preferences by proposing Test-Time Preference Optimization (TTPO), a training-free, three-stage pipeline that generates on-the-fly preference data via diffusion inversion, selects preferred/dispreferred samples with NR-IQA fusion, and guides conditioned diffusion denoising using reward signals to sharpen perceptual quality while preserving structure. By operating on any IR backbone, TTPO demonstrates broad applicability across denoising, super-resolution, deraining, and low-light enhancement, and integrates with zero-shot diffusion-based IR methods. Key innovations include frequency-decomposed guidance to separate texture from structure, a three-stage denoising schedule, and model-agnostic diffusion-based editing to produce restored images that align with human preferences without retraining. Overall, TTPO offers a practical path to perceptually superior IR results, validated by extensive quantitative and user studies, though it contends with computational cost and gaps in NR-IQA freshness to human judgments. The approach has potential impact on deploying restoration systems with improved user satisfaction in real-world imaging pipelines.

Abstract

Image restoration (IR) models are typically trained to recover high-quality images using L1 or LPIPS loss. To handle diverse unknown degradations, zero-shot IR methods have also been introduced. However, existing pre-trained and zero-shot IR approaches often fail to align with human preferences, resulting in restored images that may not be favored. This highlights the critical need to enhance restoration quality and adapt flexibly to various image restoration tasks or backbones without requiring model retraining and ideally without labor-intensive preference data collection. In this paper, we propose the first Test-Time Preference Optimization (TTPO) paradigm for image restoration, which enhances perceptual quality, generates preference data on-the-fly, and is compatible with any IR model backbone. Specifically, we design a training-free, three-stage pipeline: (i) generate candidate preference images online using diffusion inversion and denoising based on the initially restored image; (ii) select preferred and dispreferred images using automated preference-aligned metrics or human feedback; and (iii) use the selected preference images as reward signals to guide the diffusion denoising process, optimizing the restored image to better align with human preferences. Extensive experiments across various image restoration tasks and models demonstrate the effectiveness and flexibility of the proposed pipeline.

Paper Structure

This paper contains 24 sections, 11 equations, 10 figures, 8 tables.

Figures (10)

  • Figure 1: We use diffusion editing to generate candidate preference images. To preserve structural consistency, the noise scale is limited to a suitable range.
  • Figure 2: The pipeline of TTPO, which follows a generation-selection-optimization paradigm to perform test-time preference optimization for restored images. The pseudo codes are provided as Algorithm 1 in Sec. \ref{['sec:algo']}.
  • Figure 3: Qualitative comparisons between the initially restored image $y_0$, optimized image $y_{\text{TTPO}}$, preferred image $y_w$, and dispreferred image $y_l$.
  • Figure 4: Loss curves for two images in Fig. \ref{['fig:mainexp']} across all diffusion timesteps.
  • Figure 5: Integrating TTPO with the existing ZSDIR method DDNM wang2023zeroddnm.
  • ...and 5 more figures