Semantic Segmentation Prior for Diffusion-Based Real-World Super-Resolution
Jiahua Xiao, Jiawei Zhang, Dongqing Zou, Xiaodan Zhang, Jimmy Ren, Xing Wei
TL;DR
This work tackles Real-ISR by addressing semantic mislocalization and ambiguity in diffusion-based restoration. It introduces SegSR, a dual-diffusion framework that couples a diffusion-based SR model (SRDM) with a diffusion-based segmentation model (SegDM) through a Dual-Modality Bridge (DMB), enabling mutual refinement of image content and segmentation priors during reverse diffusion. By leveraging pixel-level segmentation labels as priors, SegSR improves semantic fidelity while maintaining perceptual realism, outperforming several state-of-the-art methods on synthetic and real-world benchmarks, particularly in non-reference quality metrics. The approach demonstrates that integrating segmentation priors into generative restoration can enhance both semantic accuracy and visual quality, with practical impact for real-world image enhancement tasks.
Abstract
Real-world image super-resolution (Real-ISR) has achieved a remarkable leap by leveraging large-scale text-to-image models, enabling realistic image restoration from given recognition textual prompts. However, these methods sometimes fail to recognize some salient objects, resulting in inaccurate semantic restoration in these regions. Additionally, the same region may have a strong response to more than one prompt and it will lead to semantic ambiguity for image super-resolution. To alleviate the above two issues, in this paper, we propose to consider semantic segmentation as an additional control condition into diffusion-based image super-resolution. Compared to textual prompt conditions, semantic segmentation enables a more comprehensive perception of salient objects within an image by assigning class labels to each pixel. It also mitigates the risks of semantic ambiguities by explicitly allocating objects to their respective spatial regions. In practice, inspired by the fact that image super-resolution and segmentation can benefit each other, we propose SegSR which introduces a dual-diffusion framework to facilitate interaction between the image super-resolution and segmentation diffusion models. Specifically, we develop a Dual-Modality Bridge module to enable updated information flow between these two diffusion models, achieving mutual benefit during the reverse diffusion process. Extensive experiments show that SegSR can generate realistic images while preserving semantic structures more effectively.
