OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution

Shijie Zhao; Xuanyu Zhang; Bin Chen; Weiqi Li; Qunliang Xing; Kexin Zhang; Yan Wang; Junlin Li; Li Zhang; Jian Zhang; Tianfan Xue

OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution

Shijie Zhao, Xuanyu Zhang, Bin Chen, Weiqi Li, Qunliang Xing, Kexin Zhang, Yan Wang, Junlin Li, Li Zhang, Jian Zhang, Tianfan Xue

Abstract

Aligning generative real-world image super-resolution models with human visual preference is challenging due to the perception--fidelity trade-off and diverse, unknown degradations. Prior approaches rely on offline preference optimization and static metric aggregation, which are often non-interpretable and prone to pseudo-diversity under strong conditioning. We propose OARS, a process-aware online alignment framework built on COMPASS, a MLLM-based reward that evaluates the LR to SR transition by jointly modeling fidelity preservation and perceptual gain with an input-quality-adaptive trade-off. To train COMPASS, we curate COMPASS-20K spanning synthetic and real degradations, and introduce a three-stage perceptual annotation pipeline that yields calibrated, fine-grained training labels. Guided by COMPASS, OARS performs progressive online alignment from cold-start flow matching to full-reference and finally reference-free RL via shallow LoRA optimization for on-policy exploration. Extensive experiments and user studies demonstrate consistent perceptual improvements while maintaining fidelity, achieving state-of-the-art performance on Real-ISR benchmarks.

OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution

Abstract

Paper Structure (25 sections, 7 equations, 8 figures, 9 tables)

This paper contains 25 sections, 7 equations, 8 figures, 9 tables.

Introduction
Related Work
COMPASS: COMposite Process-Aware SR Score
Motivation
COMPASS-20K
COMPASS Reward
OARS: Online Alignment for Real-world ISR
Experiment
Experimental Setup
Reward Experimental Results
Online RL on Super-Resolution Experimental Results
Ablation Studies
Comparison with DPO-based SR Methods
Conclusion
Appendix
...and 10 more sections

Figures (8)

Figure 1: Overview of the COMPASS-20K dataset construction and the COMPASS reward framework. (a) Data generation and score label annotation pipeline. (b) COMPASS uses an input-quality-adaptive mechanism to balance fidelity preservation and perceptual gain conditioned on input quality.
Figure 2: Overview of the proposed OARS framework. (a) Policy optimization process of our OARS framework; (b) Cold-start stage with paired LR--HR supervision; (c) Full-reference RL stage; (d) Non-reference RL stage on unpaired data guided by our reward.
Figure 3: Subjective user study results. OARS achieves the highest preference rate, receiving 47.62% of the total votes.
Figure 4: Qualitative comparison of OARS against state-of-the-art Real-ISR methods under complex degradations.
Figure 5: Examples from the real-world low-quality subset of COMPASS-20K.
...and 3 more figures

OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution

Abstract

OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution

Authors

Abstract

Table of Contents

Figures (8)