Table of Contents
Fetching ...

JPEG Inspired Deep Learning

Ahmed H. Salamah, Kaixiang Zheng, Yiwen Liu, En-Hui Yang

TL;DR

The paper tackles the conventional view that JPEG compression harms deep learning performance by introducing JPEG-DL, a framework that inserts a trainable JPEG layer with a differentiable soft quantizer before any DNN and jointly optimizes it with the network. By replacing the non-differentiable quantization with a differentiable quantizer based on a trainable CPMF, the method enables end-to-end learning of both quantization and model parameters, forming a unified model. Empirical results across multiple datasets and architectures show consistent accuracy gains (up to 20.9% on fine-grained tasks) and improved adversarial robustness, with only a small parameter overhead. The work demonstrates that a carefully designed, learnable JPEG front-end can serve as a powerful non-linear preprocessing stage, enhancing both performance and interpretability in vision systems.

Abstract

Although it is traditionally believed that lossy image compression, such as JPEG compression, has a negative impact on the performance of deep neural networks (DNNs), it is shown by recent works that well-crafted JPEG compression can actually improve the performance of deep learning (DL). Inspired by this, we propose JPEG-DL, a novel DL framework that prepends any underlying DNN architecture with a trainable JPEG compression layer. To make the quantization operation in JPEG compression trainable, a new differentiable soft quantizer is employed at the JPEG layer, and then the quantization operation and underlying DNN are jointly trained. Extensive experiments show that in comparison with the standard DL, JPEG-DL delivers significant accuracy improvements across various datasets and model architectures while enhancing robustness against adversarial attacks. Particularly, on some fine-grained image classification datasets, JPEG-DL can increase prediction accuracy by as much as 20.9%. Our code is available on https://github.com/AhmedHussKhalifa/JPEG-Inspired-DL.git.

JPEG Inspired Deep Learning

TL;DR

The paper tackles the conventional view that JPEG compression harms deep learning performance by introducing JPEG-DL, a framework that inserts a trainable JPEG layer with a differentiable soft quantizer before any DNN and jointly optimizes it with the network. By replacing the non-differentiable quantization with a differentiable quantizer based on a trainable CPMF, the method enables end-to-end learning of both quantization and model parameters, forming a unified model. Empirical results across multiple datasets and architectures show consistent accuracy gains (up to 20.9% on fine-grained tasks) and improved adversarial robustness, with only a small parameter overhead. The work demonstrates that a carefully designed, learnable JPEG front-end can serve as a powerful non-linear preprocessing stage, enhancing both performance and interpretability in vision systems.

Abstract

Although it is traditionally believed that lossy image compression, such as JPEG compression, has a negative impact on the performance of deep neural networks (DNNs), it is shown by recent works that well-crafted JPEG compression can actually improve the performance of deep learning (DL). Inspired by this, we propose JPEG-DL, a novel DL framework that prepends any underlying DNN architecture with a trainable JPEG compression layer. To make the quantization operation in JPEG compression trainable, a new differentiable soft quantizer is employed at the JPEG layer, and then the quantization operation and underlying DNN are jointly trained. Extensive experiments show that in comparison with the standard DL, JPEG-DL delivers significant accuracy improvements across various datasets and model architectures while enhancing robustness against adversarial attacks. Particularly, on some fine-grained image classification datasets, JPEG-DL can increase prediction accuracy by as much as 20.9%. Our code is available on https://github.com/AhmedHussKhalifa/JPEG-Inspired-DL.git.

Paper Structure

This paper contains 24 sections, 14 equations, 13 figures, 11 tables.

Figures (13)

  • Figure 1: (a) The JPEG-DL framework consists of a JPEG layer followed by a standard DNN, where the standard forward/inverse processes of JPEG depicted in white boxes are fixed and not trained. The JPEG pipeline receives a conventionally preprocessed input image $x \in \mathbb{R}^{3 \times W \times H}$ sampled from the underlying task's dataset. The JPEG layer, equipped with a differentiable soft quantizer, and the underlying DNN form a unified new DNN architecture, which are shown in blue indicating that they are trainable components. (b) As an example, we show ${\bm{z}}_{1,:,:}$, i.e., the DCT representation of the Y channel of an image, by a tensor consisting of $B$ blocks of DCT coefficients. Each 8$\times$8 block contains $M=64$ DCT frequencies, ordered from low to high in a zigzag manner. Then, we show how ${\bm{z}}_{1,1,:}$ and ${\bm{z}}_{1,M,:}$ are quantized by $Q_d(\cdot~; q=1, \alpha=10)$ and $Q_d(\cdot~; q=0.5, \alpha=16)$, respectively.
  • Figure 2: Illustration of $Q_u$ vs. $Q_d$ with $\alpha={1, 3, 5, 10}$, where $L$ and $q$ are set to 3 and 1, respectively.
  • Figure 3: Evaluate the adversarial robustness of JPEG-DL models in comparison to standard DNN on VGG13 and Res56 for CIFAR-100 against FGSM and PGD attacks.
  • Figure 4: Initial and final quantization tables for VGG13 trained on CIFAR-100 and ResNet18 trained on CUB200, with frequency indices arranged in the default zigzag order.
  • Figure 5: Feature maps of size 56×56 are shown after the first dense block in DenseNet-121 for both JPEG-DL and baseline models Figs. in \ref{['fig:baseline_feature_map']} and \ref{['fig:jpeg_feature_map']}, respectively, using an original input shown in Fig.\ref{['fig:original_feature_map']}. The JPEG-DL model highlights the foreground (bird) more distinctly, while the baseline model shows less contrast, contributing to its misclassification.
  • ...and 8 more figures