Table of Contents
Fetching ...

OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution

Shijie Zhao, Xuanyu Zhang, Bin Chen, Weiqi Li, Qunliang Xing, Kexin Zhang, Yan Wang, Junlin Li, Li Zhang, Jian Zhang, Tianfan Xue

Abstract

Aligning generative real-world image super-resolution models with human visual preference is challenging due to the perception--fidelity trade-off and diverse, unknown degradations. Prior approaches rely on offline preference optimization and static metric aggregation, which are often non-interpretable and prone to pseudo-diversity under strong conditioning. We propose OARS, a process-aware online alignment framework built on COMPASS, a MLLM-based reward that evaluates the LR to SR transition by jointly modeling fidelity preservation and perceptual gain with an input-quality-adaptive trade-off. To train COMPASS, we curate COMPASS-20K spanning synthetic and real degradations, and introduce a three-stage perceptual annotation pipeline that yields calibrated, fine-grained training labels. Guided by COMPASS, OARS performs progressive online alignment from cold-start flow matching to full-reference and finally reference-free RL via shallow LoRA optimization for on-policy exploration. Extensive experiments and user studies demonstrate consistent perceptual improvements while maintaining fidelity, achieving state-of-the-art performance on Real-ISR benchmarks.

OARS: Process-Aware Online Alignment for Generative Real-World Image Super-Resolution

Abstract

Aligning generative real-world image super-resolution models with human visual preference is challenging due to the perception--fidelity trade-off and diverse, unknown degradations. Prior approaches rely on offline preference optimization and static metric aggregation, which are often non-interpretable and prone to pseudo-diversity under strong conditioning. We propose OARS, a process-aware online alignment framework built on COMPASS, a MLLM-based reward that evaluates the LR to SR transition by jointly modeling fidelity preservation and perceptual gain with an input-quality-adaptive trade-off. To train COMPASS, we curate COMPASS-20K spanning synthetic and real degradations, and introduce a three-stage perceptual annotation pipeline that yields calibrated, fine-grained training labels. Guided by COMPASS, OARS performs progressive online alignment from cold-start flow matching to full-reference and finally reference-free RL via shallow LoRA optimization for on-policy exploration. Extensive experiments and user studies demonstrate consistent perceptual improvements while maintaining fidelity, achieving state-of-the-art performance on Real-ISR benchmarks.
Paper Structure (25 sections, 7 equations, 8 figures, 9 tables)

This paper contains 25 sections, 7 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Overview of the COMPASS-20K dataset construction and the COMPASS reward framework. (a) Data generation and score label annotation pipeline. (b) COMPASS uses an input-quality-adaptive mechanism to balance fidelity preservation and perceptual gain conditioned on input quality.
  • Figure 2: Overview of the proposed OARS framework. (a) Policy optimization process of our OARS framework; (b) Cold-start stage with paired LR--HR supervision; (c) Full-reference RL stage; (d) Non-reference RL stage on unpaired data guided by our reward.
  • Figure 3: Subjective user study results. OARS achieves the highest preference rate, receiving 47.62% of the total votes.
  • Figure 4: Qualitative comparison of OARS against state-of-the-art Real-ISR methods under complex degradations.
  • Figure 5: Examples from the real-world low-quality subset of COMPASS-20K.
  • ...and 3 more figures