Table of Contents
Fetching ...

Learning Single-Image Super-Resolution in the JPEG Compressed Domain

Sruthi Srinivasan, Elham Shakibapour, Rajy Rawther, Mehdi Saeedi

TL;DR

This work targets the data-loading bottleneck in deep learning for image restoration by prototyping an end-to-end single-image super-resolution pipeline that operates directly on JPEG DCT coefficients, bypassing full JPEG decoding. The authors introduce FreqSR, a lightweight frequency-domain architecture that processes 64 DCT channels (one per 8×8 block) with depth-wise residual blocks, while focusing on the luminance Y channel and upsampling chrominance in post-processing. Empirical results show substantial data-loading and end-to-end training speedups (approximately 2.6× and 2.5×, respectively) with competitive or acceptable visual quality compared to RGB-based baselines, highlighting the practicality of compressed-domain learning for resource-constrained settings. The work suggests promising directions for edge AI, including extensions to video SR and other codecs, and emphasizes the need to further address misalignment artifacts and robustness in compressed-domain restoration.

Abstract

Deep learning models have grown increasingly complex, with input data sizes scaling accordingly. Despite substantial advances in specialized deep learning hardware, data loading continues to be a major bottleneck that limits training and inference speed. To address this challenge, we propose training models directly on encoded JPEG features, reducing the computational overhead associated with full JPEG decoding and significantly improving data loading efficiency. While prior works have focused on recognition tasks, we investigate the effectiveness of this approach for the restoration task of single-image super-resolution (SISR). We present a lightweight super-resolution pipeline that operates on JPEG discrete cosine transform (DCT) coefficients in the frequency domain. Our pipeline achieves a 2.6x speedup in data loading and a 2.5x speedup in training, while preserving visual quality comparable to standard SISR approaches.

Learning Single-Image Super-Resolution in the JPEG Compressed Domain

TL;DR

This work targets the data-loading bottleneck in deep learning for image restoration by prototyping an end-to-end single-image super-resolution pipeline that operates directly on JPEG DCT coefficients, bypassing full JPEG decoding. The authors introduce FreqSR, a lightweight frequency-domain architecture that processes 64 DCT channels (one per 8×8 block) with depth-wise residual blocks, while focusing on the luminance Y channel and upsampling chrominance in post-processing. Empirical results show substantial data-loading and end-to-end training speedups (approximately 2.6× and 2.5×, respectively) with competitive or acceptable visual quality compared to RGB-based baselines, highlighting the practicality of compressed-domain learning for resource-constrained settings. The work suggests promising directions for edge AI, including extensions to video SR and other codecs, and emphasizes the need to further address misalignment artifacts and robustness in compressed-domain restoration.

Abstract

Deep learning models have grown increasingly complex, with input data sizes scaling accordingly. Despite substantial advances in specialized deep learning hardware, data loading continues to be a major bottleneck that limits training and inference speed. To address this challenge, we propose training models directly on encoded JPEG features, reducing the computational overhead associated with full JPEG decoding and significantly improving data loading efficiency. While prior works have focused on recognition tasks, we investigate the effectiveness of this approach for the restoration task of single-image super-resolution (SISR). We present a lightweight super-resolution pipeline that operates on JPEG discrete cosine transform (DCT) coefficients in the frequency domain. Our pipeline achieves a 2.6x speedup in data loading and a 2.5x speedup in training, while preserving visual quality comparable to standard SISR approaches.

Paper Structure

This paper contains 13 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: JPEG decompression pipeline showing the use of DCT coefficients without full decoding to RGB image.
  • Figure 2: The proposed end-to-end SISR pipeline operates on JPEG DCT coefficients as frequency domain input.
  • Figure 3: Overview of the FreqSR model architecture. The architecture uses a convolutional layer for feature extraction, depth-wise residual blocks for independent channel processing, and standard residual blocks for cross-channel refinement. Operating on reduced spatial dimensions, the model achieves faster training.
  • Figure 4: Comparison of HR images generated from the Set5 dataset. (a)-(c) and (d)-(f) show two example model outputs from EDSR RGB, EDSR Y, and our proposed SISR pipeline based on FreqSR model, respectively. PSNR values are computed on 220×220 center crops of the HR outputs. The inference results of FreqSR model are visually similar to those of other methods.