JPEG Compliant Compression for Both Human and Machine, A Report
Linfeng Ye
TL;DR
This work tackles JPEG compression's adverse impact on DNN accuracy by formulating a multi-objective rate-distortion problem that jointly optimizes human perceptual distortion and machine (DNN) performance. It introduces a sensitivity-based distortion framework, including Offline Sensitivity Estimation and Adaptive Sensitivity Mapping, and defines Human and Machine-Oriented Error (HMOE) to couple DCT-coefficient distortion with surrogate loss through a tunable parameter $\lambda$. Building on this, HMOSDQ extends Soft Decision Quantization to color JPEG, jointly optimizing run-length coding, Huffman tables, and quantization under JPEG compatibility. Experimental results on ImageNet subsets with AlexNet and VGG-16 show HMOSDQ offers improved rate-accuracy and rate-distortion performance, including up to 2.1 dB PSNR gains and a 0.81% accuracy improvement at 0.61 BPP, while achieving up to 9.6× compression reduction relative to default JPEG. These findings suggest a practical pathway to JPEG codecs that preserve both human viewing quality and DNN reliability in real-world deployments.
Abstract
Deep Neural Networks (DNNs) have become an integral part of our daily lives, especially in vision-related applications. However, the conventional lossy image compression algorithms are primarily designed for the Human Vision System (HVS), which can non-trivially compromise the DNNs' validation accuracy after compression, as noted in \cite{liu2018deepn}. Thus developing an image compression algorithm for both human and machine (DNNs) is on the horizon. To address the challenge mentioned above, in this paper, we first formulate the image compression as a multi-objective optimization problem which take both human and machine prespectives into account, then we solve it by linear combination, and proposed a novel distortion measure for both human and machine, dubbed Human and Machine-Oriented Error (HMOE). After that, we develop Human And Machine Oriented Soft Decision Quantization (HMOSDQ) based on HMOE, a lossy image compression algorithm for both human and machine (DNNs), and fully complied with JPEG format. In order to evaluate the performance of HMOSDQ, finally we conduct the experiments for two pre-trained well-known DNN-based image classifiers named Alexnet \cite{Alexnet} and VGG-16 \cite{simonyan2014VGG} on two subsets of the ImageNet \cite{deng2009imagenet} validation set: one subset included images with shorter side in the range of 496 to 512, while the other included images with shorter side in the range of 376 to 384. Our results demonstrate that HMOSDQ outperforms the default JPEG algorithm in terms of rate-accuracy and rate-distortion performance. For the Alexnet comparing with the default JPEG algorithm, HMOSDQ can improve the validation accuracy by more than $0.81\%$ at $0.61$ BPP, or equivalently reduce the compression rate of default JPEG by $9.6\times$ while maintaining the same validation accuracy.
