Table of Contents
Fetching ...

Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment

Guanglu Dong, Xiangyu Liao, Mingyang Li, Guihuan Guo, Chao Ren

TL;DR

This work tackles the limitation of coarse image-wise discriminators in GAN-based SISR by introducing semantic feature discrimination (SFD) that leverages CLIP-derived semantic features. The framework comprises a feature discriminator (Feat-D) operating on middle CLIP features and a text-guided discriminator (TG-D) using Learnable Prompt Pairs (LPP) on the final CLIP output, jointly guiding the SRN toward realistic, semantically coherent textures. An extension, SFD-IQA, reuses Feat-D and LPP to achieve improved opinion-unaware NR-IQA performance without task-specific IQA training. Extensive experiments on classical SISR, real-world SISR, and OU NR-IQA demonstrate better perception-distortion trade-offs and superior OU NR-IQA accuracy, validating the effectiveness and generality of semantic-feature-based discrimination for image restoration and quality assessment.

Abstract

Generative Adversarial Networks (GANs) have been widely applied to image super-resolution (SR) to enhance the perceptual quality. However, most existing GAN-based SR methods typically perform coarse-grained discrimination directly on images and ignore the semantic information of images, making it challenging for the super resolution networks (SRN) to learn fine-grained and semantic-related texture details. To alleviate this issue, we propose a semantic feature discrimination method, SFD, for perceptual SR. Specifically, we first design a feature discriminator (Feat-D), to discriminate the pixel-wise middle semantic features from CLIP, aligning the feature distributions of SR images with that of high-quality images. Additionally, we propose a text-guided discrimination method (TG-D) by introducing learnable prompt pairs (LPP) in an adversarial manner to perform discrimination on the more abstract output feature of CLIP, further enhancing the discriminative ability of our method. With both Feat-D and TG-D, our SFD can effectively distinguish between the semantic feature distributions of low-quality and high-quality images, encouraging SRN to generate more realistic and semantic-relevant textures. Furthermore, based on the trained Feat-D and LPP, we propose a novel opinion-unaware no-reference image quality assessment (OU NR-IQA) method, SFD-IQA, greatly improving OU NR-IQA performance without any additional targeted training. Extensive experiments on classical SISR, real-world SISR, and OU NR-IQA tasks demonstrate the effectiveness of our proposed methods.

Exploring Semantic Feature Discrimination for Perceptual Image Super-Resolution and Opinion-Unaware No-Reference Image Quality Assessment

TL;DR

This work tackles the limitation of coarse image-wise discriminators in GAN-based SISR by introducing semantic feature discrimination (SFD) that leverages CLIP-derived semantic features. The framework comprises a feature discriminator (Feat-D) operating on middle CLIP features and a text-guided discriminator (TG-D) using Learnable Prompt Pairs (LPP) on the final CLIP output, jointly guiding the SRN toward realistic, semantically coherent textures. An extension, SFD-IQA, reuses Feat-D and LPP to achieve improved opinion-unaware NR-IQA performance without task-specific IQA training. Extensive experiments on classical SISR, real-world SISR, and OU NR-IQA demonstrate better perception-distortion trade-offs and superior OU NR-IQA accuracy, validating the effectiveness and generality of semantic-feature-based discrimination for image restoration and quality assessment.

Abstract

Generative Adversarial Networks (GANs) have been widely applied to image super-resolution (SR) to enhance the perceptual quality. However, most existing GAN-based SR methods typically perform coarse-grained discrimination directly on images and ignore the semantic information of images, making it challenging for the super resolution networks (SRN) to learn fine-grained and semantic-related texture details. To alleviate this issue, we propose a semantic feature discrimination method, SFD, for perceptual SR. Specifically, we first design a feature discriminator (Feat-D), to discriminate the pixel-wise middle semantic features from CLIP, aligning the feature distributions of SR images with that of high-quality images. Additionally, we propose a text-guided discrimination method (TG-D) by introducing learnable prompt pairs (LPP) in an adversarial manner to perform discrimination on the more abstract output feature of CLIP, further enhancing the discriminative ability of our method. With both Feat-D and TG-D, our SFD can effectively distinguish between the semantic feature distributions of low-quality and high-quality images, encouraging SRN to generate more realistic and semantic-relevant textures. Furthermore, based on the trained Feat-D and LPP, we propose a novel opinion-unaware no-reference image quality assessment (OU NR-IQA) method, SFD-IQA, greatly improving OU NR-IQA performance without any additional targeted training. Extensive experiments on classical SISR, real-world SISR, and OU NR-IQA tasks demonstrate the effectiveness of our proposed methods.

Paper Structure

This paper contains 16 sections, 10 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Perception-distortion trade-off comparison between our proposed SFD and other SOTA GAN-based SISR methods.
  • Figure 2: Relative PLCC comparation between the proposed SFD-IQA and other OU NR-IQA methods.
  • Figure 3: (a) Vanilla GAN-based SISR methods: performing discrimination on images; (b) Our proposed SFD: performing discrimination on the semantic features. Vanilla GAN-based SISR typically uses an image-wise, patch-wise, or pixel-wise discriminator to determine if the distribution of SR images align with that of high-quality real-world images, ignoring the semantic information of images. Such coarse-grained discrimination makes it challenging for the SR network to reconstruct fine-grained, semantic-relevant textures. In contrast, we propose to perform discrimination on the semantic features, encouraging the SR network to generate more realistic semantic textures.
  • Figure 4: Framework of the proposed SFD. SFD consists of a feature discriminator (Feat-D) and a text-guided discrimination (TG-D) method, Feat-D and TG-D are used to perform discrimination on the middle features and final output features from CLIP, respectively.
  • Figure 5: Framework of the proposed SFD-IQA. SFD-IQA is directly based on the well trained Feat-D and LPP, and it doesn't require any additional task-specific training on IQA datasets.
  • ...and 4 more figures