Table of Contents
Fetching ...

Textural-Perceptual Joint Learning for No-Reference Super-Resolution Image Quality Assessment

Yuqing Liu, Qi Jia, Shanshe Wang, Siwei Ma, Wen Gao

TL;DR

A dual stream network to jointly explore the textural and perceptual information for quality assessment, dubbed TPNet is designed, which develops the spatial attention to make the visual sensitive information more distinguishable and utilize feature normalization (F-Norm) to boost the network representation.

Abstract

Image super-resolution (SR) has been widely investigated in recent years. However, it is challenging to fairly estimate the performance of various SR methods, as the lack of reliable and accurate criteria for the perceptual quality. Existing metrics concentrate on the specific kind of degradation without distinguishing the visual sensitive areas, which have no ability to describe the diverse SR degeneration situations in both low-level textural and high-level perceptual information. In this paper, we focus on the textural and perceptual degradation of SR images, and design a dual stream network to jointly explore the textural and perceptual information for quality assessment, dubbed TPNet. By mimicking the human vision system (HVS) that pays more attention to the significant image areas, we develop the spatial attention to make the visual sensitive information more distinguishable and utilize feature normalization (F-Norm) to boost the network representation. Experimental results show the TPNet predicts the visual quality score more accurate than other methods and demonstrates better consistency with the human's perspective. The source code will be available at \url{http://github.com/yuqing-liu-dut/NRIQA_SR}

Textural-Perceptual Joint Learning for No-Reference Super-Resolution Image Quality Assessment

TL;DR

A dual stream network to jointly explore the textural and perceptual information for quality assessment, dubbed TPNet is designed, which develops the spatial attention to make the visual sensitive information more distinguishable and utilize feature normalization (F-Norm) to boost the network representation.

Abstract

Image super-resolution (SR) has been widely investigated in recent years. However, it is challenging to fairly estimate the performance of various SR methods, as the lack of reliable and accurate criteria for the perceptual quality. Existing metrics concentrate on the specific kind of degradation without distinguishing the visual sensitive areas, which have no ability to describe the diverse SR degeneration situations in both low-level textural and high-level perceptual information. In this paper, we focus on the textural and perceptual degradation of SR images, and design a dual stream network to jointly explore the textural and perceptual information for quality assessment, dubbed TPNet. By mimicking the human vision system (HVS) that pays more attention to the significant image areas, we develop the spatial attention to make the visual sensitive information more distinguishable and utilize feature normalization (F-Norm) to boost the network representation. Experimental results show the TPNet predicts the visual quality score more accurate than other methods and demonstrates better consistency with the human's perspective. The source code will be available at \url{http://github.com/yuqing-liu-dut/NRIQA_SR}
Paper Structure (12 sections, 4 equations, 4 figures, 4 tables)

This paper contains 12 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Comparisons among different IQA metrics. (a) original image. (b) noise image. (c) blurred image. (d) super-resolved image. The higher prediction score denotes the better quality. The proposed SR IQA method (TPNet) describes the image quality more in consistence with the visual experience.
  • Figure 2: The architecture of TPNet. The perceptual branch utilizes a VGG-19 extractor to explore the high-level information. The textural branch stacks the proposed residual SR blocks to explore the low-level information.
  • Figure 3: Design of Spatial Attention (SA).
  • Figure 4: Visualized attention maps of different images. (a) input image. (b)-(e) the learned attention from stage $i=3$ to $6$. All of the attentions are normalized in range $0$ to $1$. The red areas mean the higher attention values, and the blue areas means the lower attention values. The visual sensitive areas become more distinguishable with the increase of stages.