Table of Contents
Fetching ...

Differentiable JPEG: The Devil is in the Details

Christoph Reich, Biplob Debnath, Deep Patel, Srimat Chakradhar

TL;DR

The paper tackles the non-differentiable nature of JPEG and its hindrance to gradient-based learning. It introduces a differentiable JPEG framework that models core JPEG steps and provides gradients with respect to input, quality, quantization tables, and color conversion, including a STE variant. Through extensive forward and backward evaluations and ablations, it demonstrates superior fidelity to standard JPEG across compression strengths and delivers more effective gradients for optimization and adversarial attacks. The work shows clear advantages for integrating JPEG into deep learning workflows and sets a benchmark for future differentiable image-compression research.

Abstract

JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by $3.47$dB (PSNR) on average. For strong compression rates, we can even improve PSNR by $9.51$dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.

Differentiable JPEG: The Devil is in the Details

TL;DR

The paper tackles the non-differentiable nature of JPEG and its hindrance to gradient-based learning. It introduces a differentiable JPEG framework that models core JPEG steps and provides gradients with respect to input, quality, quantization tables, and color conversion, including a STE variant. Through extensive forward and backward evaluations and ablations, it demonstrates superior fidelity to standard JPEG across compression strengths and delivers more effective gradients for optimization and adversarial attacks. The work shows clear advantages for integrating JPEG into deep learning workflows and sets a benchmark for future differentiable image-compression research.

Abstract

JPEG remains one of the most widespread lossy image coding methods. However, the non-differentiable nature of JPEG restricts the application in deep learning pipelines. Several differentiable approximations of JPEG have recently been proposed to address this issue. This paper conducts a comprehensive review of existing diff. JPEG approaches and identifies critical details that have been missed by previous methods. To this end, we propose a novel diff. JPEG approach, overcoming previous limitations. Our approach is differentiable w.r.t. the input image, the JPEG quality, the quantization tables, and the color conversion parameters. We evaluate the forward and backward performance of our diff. JPEG approach against existing methods. Additionally, extensive ablations are performed to evaluate crucial design choices. Our proposed diff. JPEG resembles the (non-diff.) reference implementation best, significantly surpassing the recent-best diff. approach by dB (PSNR) on average. For strong compression rates, we can even improve PSNR by dB. Strong adversarial attack results are yielded by our diff. JPEG, demonstrating the effective gradient approximation. Our code is available at https://github.com/necla-ml/Diff-JPEG.
Paper Structure (46 sections, 3 equations, 8 figures, 7 tables)

This paper contains 46 sections, 3 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Qualitative approximation results. For a JPEG quality of 50, both Shin et al. Shin2017 and our differentiable JPEG approach approximate the standard JPEG coding well. When reducing the JPEG quality to 1, the approach by Shin et al. does not approximate the JPEG coding well, while our differentiable JPEG still leads to a strong approximation. Structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) measured w.r.t. the coded image of the (non-differentiable) reference JPEG implementation (OpenCV Bradski2000).
  • Figure 2: The JPEG encoding-decoding pipeline. The original input image is encoded to a JPEG file in a lossy manner. To recover the coded image the encoding is reversed in the decoding. JPEG uses lossless coding in conjunction with lossy coding. Since no information is lost (identity mapping) during the lossless encoding/decoding we can neglect these coding steps in our differentiable JPEG approach.
  • Figure 3: JPEG coding artifacts.\ref{['subfig:inputimage']} Original image, \ref{['subfig:jpegimage50']} JPEG-coded image with a JPEG quality of $50$, file size is $47.3k\byte$, and \ref{['subfig:jpegimage1']} coded image with a JPEG quality of $1$, file size is $6.2k\byte$. Image from the Set14 Zeyde2012 and OpenCV Bradski2000 JPEG used.
  • Figure 4: Forward function performance. Performance of approximating the reference JPEG implementation (OpenCV Bradski2000) for different JPEG qualities. Mean & one standard deviation shown.
  • Figure 5: Forward function performance for strong compression. Performance of approximating the reference JPEG implementation (OpenCV Bradski2000) for low JPEG qualities. Mean & one standard deviation shown.
  • ...and 3 more figures