Table of Contents
Fetching ...

SSL: A Self-similarity Loss for Improving Generative Image Super-resolution

Du Chen, Zhengqiang Zhang, Jie Liang, Lei Zhang

TL;DR

The paper tackles artifacts in Real-ISR produced by GANs and diffusion models by introducing a self-similarity loss (SSL) that leverages the inherent self-similarity of natural images. SSL computes a self-similarity graph (SSG) from the ground-truth image and enforces a close match with the SSG of the Real-ISR output, focusing computations on edge regions via an offline mask. Formally, the SSL combines a KL-divergence term and an L1 term between normalized SSGs: $L_{SSL} = D_{KL}(\bar{S}_{HR} || \bar{S}_{SR}) + \alpha|\bar{S}_{SR} - \bar{S}_{HR}|$ with $\alpha=1$, serving as a plug-and-play penalty for both GAN- and DM-based Real-ISR models. Across extensive experiments on diverse models and degradations, SSL consistently improves perceptual realism and reduces artifacts, demonstrating broad applicability and practical impact, with code available at the authors' repository.

Abstract

Generative adversarial networks (GAN) and generative diffusion models (DM) have been widely used in real-world image super-resolution (Real-ISR) to enhance the image perceptual quality. However, these generative models are prone to generating visual artifacts and false image structures, resulting in unnatural Real-ISR results. Based on the fact that natural images exhibit high self-similarities, i.e., a local patch can have many similar patches to it in the whole image, in this work we propose a simple yet effective self-similarity loss (SSL) to improve the performance of generative Real-ISR models, enhancing the hallucination of structural and textural details while reducing the unpleasant visual artifacts. Specifically, we compute a self-similarity graph (SSG) of the ground-truth image, and enforce the SSG of Real-ISR output to be close to it. To reduce the training cost and focus on edge areas, we generate an edge mask from the ground-truth image, and compute the SSG only on the masked pixels. The proposed SSL serves as a general plug-and-play penalty, which could be easily applied to the off-the-shelf Real-ISR models. Our experiments demonstrate that, by coupling with SSL, the performance of many state-of-the-art Real-ISR models, including those GAN and DM based ones, can be largely improved, reproducing more perceptually realistic image details and eliminating many false reconstructions and visual artifacts. Codes and supplementary material can be found at https://github.com/ChrisDud0257/SSL

SSL: A Self-similarity Loss for Improving Generative Image Super-resolution

TL;DR

The paper tackles artifacts in Real-ISR produced by GANs and diffusion models by introducing a self-similarity loss (SSL) that leverages the inherent self-similarity of natural images. SSL computes a self-similarity graph (SSG) from the ground-truth image and enforces a close match with the SSG of the Real-ISR output, focusing computations on edge regions via an offline mask. Formally, the SSL combines a KL-divergence term and an L1 term between normalized SSGs: with , serving as a plug-and-play penalty for both GAN- and DM-based Real-ISR models. Across extensive experiments on diverse models and degradations, SSL consistently improves perceptual realism and reduces artifacts, demonstrating broad applicability and practical impact, with code available at the authors' repository.

Abstract

Generative adversarial networks (GAN) and generative diffusion models (DM) have been widely used in real-world image super-resolution (Real-ISR) to enhance the image perceptual quality. However, these generative models are prone to generating visual artifacts and false image structures, resulting in unnatural Real-ISR results. Based on the fact that natural images exhibit high self-similarities, i.e., a local patch can have many similar patches to it in the whole image, in this work we propose a simple yet effective self-similarity loss (SSL) to improve the performance of generative Real-ISR models, enhancing the hallucination of structural and textural details while reducing the unpleasant visual artifacts. Specifically, we compute a self-similarity graph (SSG) of the ground-truth image, and enforce the SSG of Real-ISR output to be close to it. To reduce the training cost and focus on edge areas, we generate an edge mask from the ground-truth image, and compute the SSG only on the masked pixels. The proposed SSL serves as a general plug-and-play penalty, which could be easily applied to the off-the-shelf Real-ISR models. Our experiments demonstrate that, by coupling with SSL, the performance of many state-of-the-art Real-ISR models, including those GAN and DM based ones, can be largely improved, reproducing more perceptually realistic image details and eliminating many false reconstructions and visual artifacts. Codes and supplementary material can be found at https://github.com/ChrisDud0257/SSL
Paper Structure (11 sections, 7 equations, 5 figures, 2 tables)

This paper contains 11 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: From left to right and top to bottom: the Real-ISR results generated by SwinIRGAN liang2021swinir, StableSR wang2023exploiting, our SSL guided StableSR and the ground-truth (GT) image. SwinIRGAN produces over-smoothed and wrong results, while StableSR produces more details but with false structures and artifacts. Our SSL guided StableSR generates more faithful details while suppressing much the artifacts.
  • Figure 2: Illustration of the training progress of (a) generative adversarial network (GAN) based and (b) latent diffusion model (DM) based Real-ISR by using our proposed self-similarity loss (SSL). The GAN or DM network is employed to map the input LR image to an ISR output. We calculate the self-similarity graphs (SSG) of both ISR output and ground-truth (GT) image, and calculate the SSL between them to supervise the generation of image details and structures.
  • Figure 3: Illustration of the self-similarity graph (SSG) computing process. We first generate a mask to indicate the image edge areas by applying the Laplacian Operator on the GT image. During the training period, for each edge pixel in the mask, we find the corresponding pixels in the GT image and ISR image, and set a search area centred at them. A local sliding window is utilized to calculate the similarity between each pixel in the search area and the central pixel so that an SSG can be respectively computed for the GT image and the ISR image, with which the SSL can be computed. The red pixel means the edge pixel, while the blue block means the sliding window.
  • Figure 4: Visual comparison of the state-of-the-art GAN based Real-ISR models and their counterparts trained with our SSL. The bicubic degradation model is used here. From the top row to the bottom row are the results of bicubic interpolation, the original Real-ISR model, the Real-ISR model trained with our SSL, and the GT image. Please zoom in for better observation.
  • Figure 5: Visual comparison of the state-of-the-art DM based Real-ISR models and their counterparts trained with our SSL. From the top row to the bottom row are the results of bicubic interpolation, the original Real-ISR model, the Real-ISR model trained with our SSL, and the GT image. Please zoom in for better observation.