LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution

Song Fei; Tian Ye; Sixiang Chen; Zhaohu Xing; Jianyu Lai; Lei Zhu

LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution

Song Fei, Tian Ye, Sixiang Chen, Zhaohu Xing, Jianyu Lai, Lei Zhu

TL;DR

LucidNFT, a multi-reward RL framework for flow-matching Real-ISR baselines, is proposed and experiments show that LucidNFT consistently improves strong flow-based Real-ISR baselines, achieving better perceptual-faithfulness trade-offs with stable optimization dynamics across diverse real-world scenarios.

Abstract

Generative real-world image super-resolution (Real-ISR) can synthesize visually convincing details from severely degraded low-resolution (LR) inputs, yet its stochastic sampling makes a critical failure mode hard to avoid: outputs may look sharp but be unfaithful to the LR evidence (semantic and structural hallucination), while such LR-anchored faithfulness is difficult to assess without HR ground truth. Preference-based reinforcement learning (RL) is a natural fit because each LR input yields a rollout group of candidates to compare. However, effective alignment in Real-ISR is hindered by (i) the lack of a degradation-robust LR-referenced faithfulness signal, and (ii) a rollout-group optimization bottleneck where naive multi-reward scalarization followed by normalization compresses objective-wise contrasts, causing advantage collapse and weakening the reward-weighted updates in DiffusionNFT-style forward fine-tuning. Moreover, (iii) limited coverage of real degradations restricts rollout diversity and preference signal quality. We propose LucidNFT, a multi-reward RL framework for flow-matching Real-ISR. LucidNFT introduces LucidConsistency, a degradation-robust semantic evaluator that makes LR-anchored faithfulness measurable and optimizable; a decoupled advantage normalization strategy that preserves objective-wise contrasts within each LR-conditioned rollout group before fusion, preventing advantage collapse; and LucidLR, a large-scale collection of real-world degraded images to support robust RL fine-tuning. Experiments show that LucidNFT consistently improves strong flow-based Real-ISR baselines, achieving better perceptual-faithfulness trade-offs with stable optimization dynamics across diverse real-world scenarios.

LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution

TL;DR

Abstract

Paper Structure (19 sections, 18 equations, 5 figures, 4 tables)

This paper contains 19 sections, 18 equations, 5 figures, 4 tables.

Introduction
Related Work
Generative Real-World Image Super-Resolution
Evaluation Methods for Real-World Image Super-Resolution
Reinforcement Learning for Vision Generation Models
Real-World Image Super-Resolution Datasets
Preliminaries
Flow Matching
Diffusion Negative-aware FineTuning
Method
LucidConsistency: Degradation-Robust Consistency Evaluation
Multi-Reward Reinforcement Learning for Real-ISR
LucidLR: A Large-Scale Real-World Degradation Dataset for Real-ISR
Experiment
Implementation Details
...and 4 more sections

Figures (5)

Figure 1: Overview of LucidConsistency. Left: inference stage where embeddings of the LR input and SR output are extracted and their semantic consistency is computed via Eq. (\ref{['eq:compute_similarity']}). Right: training stage where LR–HR pairs are used to optimizate the projection head.
Figure 2: Advantage separability analysis on the LucidFlux backbone using dataset RealLQ250 DreamClear. (a) DAGC versus rollout count $M$; (b) mean pairwise advantage gap $|\Delta A|$ versus $M$ using the top-1 max-$\Delta r$ pair per group; (c) distribution of $|\Delta A|$ at $M=12$. LucidNFT consistently yields larger advantage gaps and higher separability than DiffusionNFT, indicating reduced advantage compression under decoupled normalization.
Figure 3: Representative examples from LucidLR.
Figure 4: Training dynamics of LucidNFT on LucidFlux. From left to right: training LucidConsistency score, evaluation LucidConsistency score, training UniPercept IQA score, and evaluation UniPercept IQA score. The smoothed curves exhibit a consistent upward trend, indicating stable multi-reward optimization during RL.
Figure 5: Visual comparison on RealLQ250 DreamClear. LucidNFT further improves semantic consistency and perceptual quality over the baseline LucidFlux, producing more faithful structures and richer texture details.

LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution

TL;DR

Abstract

LucidNFT: LR-Anchored Multi-Reward Preference Optimization for Generative Real-World Super-Resolution

Authors

TL;DR

Abstract

Table of Contents

Figures (5)