DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution
Miaomiao Cai, Simiao Li, Wei Li, Xudong Huang, Hanting Chen, Jie Hu, Yunhe Wang
TL;DR
<3-5 sentence high-level summary> Real-World Image Super-Resolution (Real-ISR) models risk misalignment with human preferences, leading to artifacts and hallucinations when optimized purely on pixel-level reconstruction. The authors introduce Direct Semantic Preference Optimization (DSPO), a plug-and-play framework that injects semantic guidance through a semantic instance alignment strategy and a user description feedback strategy, extending Direct Preference Optimization (DPO) to instance-level, semantically guided learning for SR. DSPO uses Best/Worst-of-N on semantically segmented instances (via SAM) and diffusion-based loss terms that compare to a reference model, while leveraging textual feedback to constrain generation and suppress hallucinations. Experimental results across one-step and multi-step SR demonstrate improved perceptual quality and fidelity, with strong human-annotator and automatic-IQA support, establishing a new paradigm for aligning SR outputs with human preferences.
Abstract
Recent advances in diffusion models have improved Real-World Image Super-Resolution (Real-ISR), but existing methods lack human feedback integration, risking misalignment with human preference and may leading to artifacts, hallucinations and harmful content generation. To this end, we are the first to introduce human preference alignment into Real-ISR, a technique that has been successfully applied in Large Language Models and Text-to-Image tasks to effectively enhance the alignment of generated outputs with human preferences. Specifically, we introduce Direct Preference Optimization (DPO) into Real-ISR to achieve alignment, where DPO serves as a general alignment technique that directly learns from the human preference dataset. Nevertheless, unlike high-level tasks, the pixel-level reconstruction objectives of Real-ISR are difficult to reconcile with the image-level preferences of DPO, which can lead to the DPO being overly sensitive to local anomalies, leading to reduced generation quality. To resolve this dichotomy, we propose Direct Semantic Preference Optimization (DSPO) to align instance-level human preferences by incorporating semantic guidance, which is through two strategies: (a) semantic instance alignment strategy, implementing instance-level alignment to ensure fine-grained perceptual consistency, and (b) user description feedback strategy, mitigating hallucinations through semantic textual feedback on instance-level images. As a plug-and-play solution, DSPO proves highly effective in both one-step and multi-step SR frameworks.
