Enhanced Semantic Extraction and Guidance for UGC Image Super Resolution
Yiwen Wang, Ying Liang, Yuxuan Zhang, Xinning Chai, Zhengxue Cheng, Yingsheng Qin, Yucai Yang, Rong Xie, Li Song
TL;DR
This work addresses the gap between real-world UGC degradations and synthetic degradations in single-image super-resolution by embedding semantic guidance into a diffusion-based framework. It constructs a more realistic training regime by combining LSDIR-based degradations with synthetic UGC data, and leverages SAM2 for high-level semantic conditioning alongside ControlNet to preserve structure. The semantic-aware module—integrated into the diffusion denoising process—improves both perceptual fidelity and semantic coherence, demonstrated through extensive quantitative and qualitative experiments, including strong performance on wild UGC data and competitive results on synthetic data and DIV2K. The approach effectively narrows the domain gap between synthetic and real-world degradations, offering a robust solution for practical UGC image enhancement with potential for further improvement in text regions and artifact control.
Abstract
Due to the disparity between real-world degradations in user-generated content(UGC) images and synthetic degradations, traditional super-resolution methods struggle to generalize effectively, necessitating a more robust approach to model real-world distortions. In this paper, we propose a novel approach to UGC image super-resolution by integrating semantic guidance into a diffusion framework. Our method addresses the inconsistency between degradations in wild and synthetic datasets by separately simulating the degradation processes on the LSDIR dataset and combining them with the official paired training set. Furthermore, we enhance degradation removal and detail generation by incorporating a pretrained semantic extraction model (SAM2) and fine-tuning key hyperparameters for improved perceptual fidelity. Extensive experiments demonstrate the superiority of our approach against state-of-the-art methods. Additionally, the proposed model won second place in the CVPR NTIRE 2025 Short-form UGC Image Super-Resolution Challenge, further validating its effectiveness. The code is available at https://github.c10pom/Moonsofang/NTIRE-2025-SRlab.
