Table of Contents
Fetching ...

Deep Generative Model based Rate-Distortion for Image Downscaling Assessment

Yuanbang Liang, Bhavesh Garg, Paul L Rosin, Yipeng Qin

TL;DR

IDA-RD introduces a rate-distortion inspired metric to evaluate image downscaling algorithms without ground-truth LR references. By leveraging blind, stochastic super-resolution models (e.g., StyleGAN inversion and SRFlow) to reconstruct a conditional HR distribution, the metric measures distortion in HR space as $S(f_{ds}) = \mathbb{E}[D_Q(X,\hat{X})]$ under $I_Q(X;\hat{X}) \le R$. The approach is validated across synthetic degradations and real downscaling methods, demonstrating that higher information loss (via larger scale factors or stronger degradations) yields higher IDA-RD scores and that perceptual improvements do not necessarily imply information preservation. The results show IDA-RD provides complementary, quantitative insight into downscaling performance, with practical implications for designing robust downscaling algorithms and for evaluating them across diverse content and domains.

Abstract

In this paper, we propose Image Downscaling Assessment by Rate-Distortion (IDA-RD), a novel measure to quantitatively evaluate image downscaling algorithms. In contrast to image-based methods that measure the quality of downscaled images, ours is process-based that draws ideas from rate-distortion theory to measure the distortion incurred during downscaling. Our main idea is that downscaling and super-resolution (SR) can be viewed as the encoding and decoding processes in the rate-distortion model, respectively, and that a downscaling algorithm that preserves more details in the resulting low-resolution (LR) images should lead to less distorted high-resolution (HR) images in SR. In other words, the distortion should increase as the downscaling algorithm deteriorates. However, it is non-trivial to measure this distortion as it requires the SR algorithm to be blind and stochastic. Our key insight is that such requirements can be met by recent SR algorithms based on deep generative models that can find all matching HR images for a given LR image on their learned image manifolds. Extensive experimental results show the effectiveness of our IDA-RD measure.

Deep Generative Model based Rate-Distortion for Image Downscaling Assessment

TL;DR

IDA-RD introduces a rate-distortion inspired metric to evaluate image downscaling algorithms without ground-truth LR references. By leveraging blind, stochastic super-resolution models (e.g., StyleGAN inversion and SRFlow) to reconstruct a conditional HR distribution, the metric measures distortion in HR space as under . The approach is validated across synthetic degradations and real downscaling methods, demonstrating that higher information loss (via larger scale factors or stronger degradations) yields higher IDA-RD scores and that perceptual improvements do not necessarily imply information preservation. The results show IDA-RD provides complementary, quantitative insight into downscaling performance, with practical implications for designing robust downscaling algorithms and for evaluating them across diverse content and domains.

Abstract

In this paper, we propose Image Downscaling Assessment by Rate-Distortion (IDA-RD), a novel measure to quantitatively evaluate image downscaling algorithms. In contrast to image-based methods that measure the quality of downscaled images, ours is process-based that draws ideas from rate-distortion theory to measure the distortion incurred during downscaling. Our main idea is that downscaling and super-resolution (SR) can be viewed as the encoding and decoding processes in the rate-distortion model, respectively, and that a downscaling algorithm that preserves more details in the resulting low-resolution (LR) images should lead to less distorted high-resolution (HR) images in SR. In other words, the distortion should increase as the downscaling algorithm deteriorates. However, it is non-trivial to measure this distortion as it requires the SR algorithm to be blind and stochastic. Our key insight is that such requirements can be met by recent SR algorithms based on deep generative models that can find all matching HR images for a given LR image on their learned image manifolds. Extensive experimental results show the effectiveness of our IDA-RD measure.
Paper Structure (29 sections, 5 equations, 4 figures, 21 tables)

This paper contains 29 sections, 5 equations, 4 figures, 21 tables.

Figures (4)

  • Figure 1: Illustration of the proposed IDA-RD measure. Given a downscaling method $f_{ds}$ to be evaluated, i) we first use it to downscale several HR images; ii) then, we upscale them back to the original resolution with $f_{us}$ and measure the distortion from the corresponding HR images. Such an upscaling method leverages the recent success in deep generative models and thus can i) apply to arbitrarily down-scaled images and ii) output a manifold of HR images that captures the conditional distribution given a downscaled image.
  • Figure 1: SRFlow becomes unstable for a scaling factor of 8$\times$ on real-world datasets, e.g., DIV2K (Row 1), while such cases never happen for domain-specific datasets, e.g., FFHQ (Row 2). From the left to right, the method to down scaling are N.N., DPID, Perceptual and $L0$-reg. separately.
  • Figure 2: Examples of images ($\times$8) from FFHQ, DIV2K and Flickr30K datasets downscaled by real-world image downscaling methods. (a) Bicubic (b) Bilinear (c) Nearest Neighbor (N.N.) (d) DPID (e) Perceptual (f) $L0$-regularized
  • Figure 3: Qualitative evaluation of existing image downscaling methods. Original: the input HR image; LR: the downscaled LR image; SR1, SR2, SR3: three instances of upscaled images; MD1, MD2, MD3: difference map visualizations of (SR1, Original), (SR2, Original), and (SR3, Original), respectively. The white numbers on the left-top corners: the corresponding LPIPS scores of the difference map visualizations. State-of-the-art image downscaling methods (DPID, Perceptual and $L0$-reg.) achieve better perceptual quality by "exaggerating" perceptually important features in the original image (e.g., building lights, water reflections), thus leading to over-exaggeration in the upscaled images and lower IDA-RD scores.