Exploring Compressed Image Representation as a Perceptual Proxy: A Study

Chen-Hsiu Huang; Ja-Ling Wu

Exploring Compressed Image Representation as a Perceptual Proxy: A Study

Chen-Hsiu Huang, Ja-Ling Wu

TL;DR

This study affirms that the compressed latent representation can predict human perceptual distance judgments with an accuracy comparable to a custom-tailored DNN-based quality metric.

Abstract

We propose an end-to-end learned image compression codec wherein the analysis transform is jointly trained with an object classification task. This study affirms that the compressed latent representation can predict human perceptual distance judgments with an accuracy comparable to a custom-tailored DNN-based quality metric. We further investigate various neural encoders and demonstrate the effectiveness of employing the analysis transform as a perceptual loss network for image tasks beyond quality judgments. Our experiments show that the off-the-shelf neural encoder proves proficient in perceptual modeling without needing an additional VGG network. We expect this research to serve as a valuable reference developing of a semantic-aware and coding-efficient neural encoder.

Exploring Compressed Image Representation as a Perceptual Proxy: A Study

TL;DR

This study affirms that the compressed latent representation can predict human perceptual distance judgments with an accuracy comparable to a custom-tailored DNN-based quality metric.

Abstract

Paper Structure (16 sections, 2 equations, 4 figures, 4 tables)

This paper contains 16 sections, 2 equations, 4 figures, 4 tables.

Introduction
Related Works
Proposed Method
Experimental Results
Conclusion
References

Figures (4)

Figure 1: (a) The use of semantic and perceptual transforms for object classification and image similarity judgment. (b) The use of an analysis transform as a proxy for perceptual transform: $\rho_p=\omega(g_a(x);w)$.
Figure 2: The enhanced network architect of CPIPS. We use the parameterized ReLU and the Generalized Divisive Normalization (GDN) balle2018variational as the activation function. The convolution notation $n\times k\times k/s$ represents the filter numbers, kernel size, and stride size.
Figure 3: Qualitative comparisons of style transfer. The transferred images' snapshots (started from the third column) show that the raw hyperprior codec struggles to transfer high-level features as styles.
Figure 4: Visual comparisons of SRGAN 4X super-resolution.

Exploring Compressed Image Representation as a Perceptual Proxy: A Study

TL;DR

Abstract

Exploring Compressed Image Representation as a Perceptual Proxy: A Study

Authors

TL;DR

Abstract

Table of Contents

Figures (4)