Table of Contents
Fetching ...

DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution

Miaomiao Cai, Simiao Li, Wei Li, Xudong Huang, Hanting Chen, Jie Hu, Yunhe Wang

TL;DR

<3-5 sentence high-level summary> Real-World Image Super-Resolution (Real-ISR) models risk misalignment with human preferences, leading to artifacts and hallucinations when optimized purely on pixel-level reconstruction. The authors introduce Direct Semantic Preference Optimization (DSPO), a plug-and-play framework that injects semantic guidance through a semantic instance alignment strategy and a user description feedback strategy, extending Direct Preference Optimization (DPO) to instance-level, semantically guided learning for SR. DSPO uses Best/Worst-of-N on semantically segmented instances (via SAM) and diffusion-based loss terms that compare to a reference model, while leveraging textual feedback to constrain generation and suppress hallucinations. Experimental results across one-step and multi-step SR demonstrate improved perceptual quality and fidelity, with strong human-annotator and automatic-IQA support, establishing a new paradigm for aligning SR outputs with human preferences.

Abstract

Recent advances in diffusion models have improved Real-World Image Super-Resolution (Real-ISR), but existing methods lack human feedback integration, risking misalignment with human preference and may leading to artifacts, hallucinations and harmful content generation. To this end, we are the first to introduce human preference alignment into Real-ISR, a technique that has been successfully applied in Large Language Models and Text-to-Image tasks to effectively enhance the alignment of generated outputs with human preferences. Specifically, we introduce Direct Preference Optimization (DPO) into Real-ISR to achieve alignment, where DPO serves as a general alignment technique that directly learns from the human preference dataset. Nevertheless, unlike high-level tasks, the pixel-level reconstruction objectives of Real-ISR are difficult to reconcile with the image-level preferences of DPO, which can lead to the DPO being overly sensitive to local anomalies, leading to reduced generation quality. To resolve this dichotomy, we propose Direct Semantic Preference Optimization (DSPO) to align instance-level human preferences by incorporating semantic guidance, which is through two strategies: (a) semantic instance alignment strategy, implementing instance-level alignment to ensure fine-grained perceptual consistency, and (b) user description feedback strategy, mitigating hallucinations through semantic textual feedback on instance-level images. As a plug-and-play solution, DSPO proves highly effective in both one-step and multi-step SR frameworks.

DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution

TL;DR

<3-5 sentence high-level summary> Real-World Image Super-Resolution (Real-ISR) models risk misalignment with human preferences, leading to artifacts and hallucinations when optimized purely on pixel-level reconstruction. The authors introduce Direct Semantic Preference Optimization (DSPO), a plug-and-play framework that injects semantic guidance through a semantic instance alignment strategy and a user description feedback strategy, extending Direct Preference Optimization (DPO) to instance-level, semantically guided learning for SR. DSPO uses Best/Worst-of-N on semantically segmented instances (via SAM) and diffusion-based loss terms that compare to a reference model, while leveraging textual feedback to constrain generation and suppress hallucinations. Experimental results across one-step and multi-step SR demonstrate improved perceptual quality and fidelity, with strong human-annotator and automatic-IQA support, establishing a new paradigm for aligning SR outputs with human preferences.

Abstract

Recent advances in diffusion models have improved Real-World Image Super-Resolution (Real-ISR), but existing methods lack human feedback integration, risking misalignment with human preference and may leading to artifacts, hallucinations and harmful content generation. To this end, we are the first to introduce human preference alignment into Real-ISR, a technique that has been successfully applied in Large Language Models and Text-to-Image tasks to effectively enhance the alignment of generated outputs with human preferences. Specifically, we introduce Direct Preference Optimization (DPO) into Real-ISR to achieve alignment, where DPO serves as a general alignment technique that directly learns from the human preference dataset. Nevertheless, unlike high-level tasks, the pixel-level reconstruction objectives of Real-ISR are difficult to reconcile with the image-level preferences of DPO, which can lead to the DPO being overly sensitive to local anomalies, leading to reduced generation quality. To resolve this dichotomy, we propose Direct Semantic Preference Optimization (DSPO) to align instance-level human preferences by incorporating semantic guidance, which is through two strategies: (a) semantic instance alignment strategy, implementing instance-level alignment to ensure fine-grained perceptual consistency, and (b) user description feedback strategy, mitigating hallucinations through semantic textual feedback on instance-level images. As a plug-and-play solution, DSPO proves highly effective in both one-step and multi-step SR frameworks.

Paper Structure

This paper contains 33 sections, 8 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: The dilemma between image-level preferences in DPO and pixel-level reconstruction objectives in Real-ISR: The preferred image, selected by the 'winner of overall image visual pleasure' rule, appears sharper in the yellow bbox and other areas but shows local hallucinations in the red bbox, where the dispreferred image performs better.
  • Figure 2: The overview of proposed direct semantic preference optimization (DSPO) method.
  • Figure 3: The user preference win rates of DSPO, compared to pre-trained method, SFT, DDPO, and Diffusion-DPO, based on the Dreal dataset (top) and the Real dataset (bottom), under human annotation. We provide the 95% confidence interval of the win rate based on three independent annotation rounds.
  • Figure 4: Quantitative comparison of DSPO with baselines on the automatic IQA Method. (a)-(c) depict the radar plots for the one-step SR framework on RealSR, DRealSR, and DIV2K-val, while (d)-(f) show radar plots for the multi-step SR framework on the same datasets. Note that all metrics are normalized and their trends are adjusted to be monotonically positive.
  • Figure 5: Visualization comparison of DSPO and other baseline methods on the human annotator method (Top), and the automatic IQA method (Bottom).
  • ...and 1 more figures