Table of Contents
Fetching ...

StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

Yachao Li, Dong Liang, Tianyu Ding, Sheng-Jun Huang

TL;DR

StructSR tackles spurious structural artifacts in diffusion-based Real-ISR by leveraging intra-inference structure cues. It introduces Structure-Aware Screening (SAS) to select a structure-aligned latent embedding, Structure Condition Embedding (SCE) to guide noise prediction, and Image Details Embedding (IDE) to gently inject structural cues during inference. The approach is plug-and-play, requiring no fine-tuning or external priors, and it improves PSNR and SSIM across multiple baselines on synthetic and real-world datasets while reducing artifacts. Overall, StructSR provides a practical, inference-time strategy to enhance structural fidelity and suppress spurious textures in Real-ISR, with broad compatibility and demonstrated performance gains.

Abstract

Diffusion-based models have shown great promise in real-world image super-resolution (Real-ISR), but often generate content with structural errors and spurious texture details due to the empirical priors and illusions of these models. To address this issue, we introduce StructSR, a simple, effective, and plug-and-play method that enhances structural fidelity and suppresses spurious details for diffusion-based Real-ISR. StructSR operates without the need for additional fine-tuning, external model priors, or high-level semantic knowledge. At its core is the Structure-Aware Screening (SAS) mechanism, which identifies the image with the highest structural similarity to the low-resolution (LR) input in the early inference stage, allowing us to leverage it as a historical structure knowledge to suppress the generation of spurious details. By intervening in the diffusion inference process, StructSR seamlessly integrates with existing diffusion-based Real-ISR models. Our experimental results demonstrate that StructSR significantly improves the fidelity of structure and texture, improving the PSNR and SSIM metrics by an average of 5.27% and 9.36% on a synthetic dataset (DIV2K-Val) and 4.13% and 8.64% on two real-world datasets (RealSR and DRealSR) when integrated with four state-of-the-art diffusion-based Real-ISR methods.

StructSR: Refuse Spurious Details in Real-World Image Super-Resolution

TL;DR

StructSR tackles spurious structural artifacts in diffusion-based Real-ISR by leveraging intra-inference structure cues. It introduces Structure-Aware Screening (SAS) to select a structure-aligned latent embedding, Structure Condition Embedding (SCE) to guide noise prediction, and Image Details Embedding (IDE) to gently inject structural cues during inference. The approach is plug-and-play, requiring no fine-tuning or external priors, and it improves PSNR and SSIM across multiple baselines on synthetic and real-world datasets while reducing artifacts. Overall, StructSR provides a practical, inference-time strategy to enhance structural fidelity and suppress spurious textures in Real-ISR, with broad compatibility and demonstrated performance gains.

Abstract

Diffusion-based models have shown great promise in real-world image super-resolution (Real-ISR), but often generate content with structural errors and spurious texture details due to the empirical priors and illusions of these models. To address this issue, we introduce StructSR, a simple, effective, and plug-and-play method that enhances structural fidelity and suppresses spurious details for diffusion-based Real-ISR. StructSR operates without the need for additional fine-tuning, external model priors, or high-level semantic knowledge. At its core is the Structure-Aware Screening (SAS) mechanism, which identifies the image with the highest structural similarity to the low-resolution (LR) input in the early inference stage, allowing us to leverage it as a historical structure knowledge to suppress the generation of spurious details. By intervening in the diffusion inference process, StructSR seamlessly integrates with existing diffusion-based Real-ISR models. Our experimental results demonstrate that StructSR significantly improves the fidelity of structure and texture, improving the PSNR and SSIM metrics by an average of 5.27% and 9.36% on a synthetic dataset (DIV2K-Val) and 4.13% and 8.64% on two real-world datasets (RealSR and DRealSR) when integrated with four state-of-the-art diffusion-based Real-ISR methods.
Paper Structure (16 sections, 6 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 16 sections, 6 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of diffusion-based Real-ISR methods with and without StructSR integration. The original methods generate spurious details in both English letters and Chinese characters. Integration with StructSR significantly reduces these artifacts, resulting in more accurate reconstruction.
  • Figure 1: Ablation study on SCE and IDE with state-of-the-art diffusion-based Real-ISR baselines. Integration with StructSR generates high-fidelity structures by combining the clear structural guidance provided by SCE and the suppression of spurious details by IDE.
  • Figure 2: Comparison of the structural similarity (SSIM) between LR images with different degradation degrees and their temporal reconstructed images during the inference process. The calculated SSIM values are shown on the top of the reconstructed images, with the maximum SSIM value in red. The red boxes show the issues of structural errors and spurious details. It cannot maintain a stable SSIM, indicating the generation of spurious structure and texture details in the later stage of the inference.
  • Figure 2: Comparison of the SSIM between LR images with different degradation degrees and their temporal reconstructed images during the StableSR inference process with total inference timesteps $T = 200$.
  • Figure 3: In the proposed StructSR, the Structure-Aware Screening (SAS) works in the early inference stage and screens out the structural embedding $Z_{SE}$ with the most consistent and clearer structure compared to the LR image. In the later inference stage, The Structure Condition Embedding (SCE) uses $Z_{SE}$ to guide $\epsilon_t$ in conjunction with the LR image. The Image Details Embedding (IDE) inserts $Z_{SE}$ into the clean latent image $Z_{0|t}$ at each timestep according to the degradation degree.
  • ...and 2 more figures