StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Yachao Li; Dong Liang; Tianyu Ding; Sheng-Jun Huang

StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Yachao Li, Dong Liang, Tianyu Ding, Sheng-Jun Huang

TL;DR

StructSR tackles spurious structural artifacts in diffusion-based Real-ISR by leveraging intra-inference structure cues. It introduces Structure-Aware Screening (SAS) to select a structure-aligned latent embedding, Structure Condition Embedding (SCE) to guide noise prediction, and Image Details Embedding (IDE) to gently inject structural cues during inference. The approach is plug-and-play, requiring no fine-tuning or external priors, and it improves PSNR and SSIM across multiple baselines on synthetic and real-world datasets while reducing artifacts. Overall, StructSR provides a practical, inference-time strategy to enhance structural fidelity and suppress spurious textures in Real-ISR, with broad compatibility and demonstrated performance gains.

Abstract

Diffusion-based models have shown great promise in real-world image super-resolution (Real-ISR), but often generate content with structural errors and spurious texture details due to the empirical priors and illusions of these models. To address this issue, we introduce StructSR, a simple, effective, and plug-and-play method that enhances structural fidelity and suppresses spurious details for diffusion-based Real-ISR. StructSR operates without the need for additional fine-tuning, external model priors, or high-level semantic knowledge. At its core is the Structure-Aware Screening (SAS) mechanism, which identifies the image with the highest structural similarity to the low-resolution (LR) input in the early inference stage, allowing us to leverage it as a historical structure knowledge to suppress the generation of spurious details. By intervening in the diffusion inference process, StructSR seamlessly integrates with existing diffusion-based Real-ISR models. Our experimental results demonstrate that StructSR significantly improves the fidelity of structure and texture, improving the PSNR and SSIM metrics by an average of 5.27% and 9.36% on a synthetic dataset (DIV2K-Val) and 4.13% and 8.64% on two real-world datasets (RealSR and DRealSR) when integrated with four state-of-the-art diffusion-based Real-ISR methods.

StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

TL;DR

Abstract

Paper Structure (16 sections, 6 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 6 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Methodology
Basic Definition in Diffusion-based Real-ISR
Role of Structural Similarity in Real-ISR
Structure-Aware Screening
Structure Condition Embedding
Image Details Embedding
Experiments
Experimental Settings
Comparison with the State-of-the-Art
Conclusion
Acknowledgments
Ablation Study
Effectiveness of SCE and IDE
...and 1 more sections

Figures (7)

Figure 1: Comparison of diffusion-based Real-ISR methods with and without StructSR integration. The original methods generate spurious details in both English letters and Chinese characters. Integration with StructSR significantly reduces these artifacts, resulting in more accurate reconstruction.
Figure 1: Ablation study on SCE and IDE with state-of-the-art diffusion-based Real-ISR baselines. Integration with StructSR generates high-fidelity structures by combining the clear structural guidance provided by SCE and the suppression of spurious details by IDE.
Figure 2: Comparison of the structural similarity (SSIM) between LR images with different degradation degrees and their temporal reconstructed images during the inference process. The calculated SSIM values are shown on the top of the reconstructed images, with the maximum SSIM value in red. The red boxes show the issues of structural errors and spurious details. It cannot maintain a stable SSIM, indicating the generation of spurious structure and texture details in the later stage of the inference.
Figure 2: Comparison of the SSIM between LR images with different degradation degrees and their temporal reconstructed images during the StableSR inference process with total inference timesteps $T = 200$.
Figure 3: In the proposed StructSR, the Structure-Aware Screening (SAS) works in the early inference stage and screens out the structural embedding $Z_{SE}$ with the most consistent and clearer structure compared to the LR image. In the later inference stage, The Structure Condition Embedding (SCE) uses $Z_{SE}$ to guide $\epsilon_t$ in conjunction with the LR image. The Image Details Embedding (IDE) inserts $Z_{SE}$ into the clean latent image $Z_{0|t}$ at each timestep according to the degradation degree.
...and 2 more figures

StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

TL;DR

Abstract

StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Authors

TL;DR

Abstract

Table of Contents

Figures (7)