Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models

Dan Jacobellis; Daniel Cummings; Neeraja J. Yadwadkar

Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models

Dan Jacobellis, Daniel Cummings, Neeraja J. Yadwadkar

TL;DR

The paper investigates how severe lossy compression affects machine perception across image and audio tasks. It systematically evaluates six datasets and seven compression methods (conventional, neural, and generative) on pre-trained models for classification, segmentation, speech recognition, and music separation, using rate, PSNR, and deep similarity metrics such as LPIPS and CDPAM. Key findings show that generative compression can preserve machine perceptual quality at very high compression levels, deep similarity metrics align with downstream performance, and pretraining on lossy data can counterintuitively improve results in some cases. These insights inform the design of machine-oriented codecs and data storage strategies, with practical implications for bandwidth-limited and large-scale sensing environments; code for the experiments is publicly available.

Abstract

In the field of neural data compression, the prevailing focus has been on optimizing algorithms for either classical distortion metrics, such as PSNR or SSIM, or human perceptual quality. With increasing amounts of data consumed by machines rather than humans, a new paradigm of machine-oriented compression$\unicode{x2013}$which prioritizes the retention of features salient for machine perception over traditional human-centric criteria$\unicode{x2013}$has emerged, creating several new challenges to the development, evaluation, and deployment of systems utilizing lossy compression. In particular, it is unclear how different approaches to lossy compression will affect the performance of downstream machine perception tasks. To address this under-explored area, we evaluate various perception models$\unicode{x2013}$including image classification, image segmentation, speech recognition, and music source separation$\unicode{x2013}$under severe lossy compression. We utilize several popular codecs spanning conventional, neural, and generative compression architectures. Our results indicate three key findings: (1) using generative compression, it is feasible to leverage highly compressed data while incurring a negligible impact on machine perceptual quality; (2) machine perceptual quality correlates strongly with deep similarity metrics, indicating a crucial role of these metrics in the development of machine-oriented codecs; and (3) using lossy compressed datasets, (e.g. ImageNet) for pre-training can lead to counter-intuitive scenarios where lossy compression increases machine perceptual quality rather than degrading it. To encourage engagement on this growing area of research, our code and experiments are available at: https://github.com/danjacobellis/MPQ.

Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models

TL;DR

Abstract

which prioritizes the retention of features salient for machine perception over traditional human-centric criteria

has emerged, creating several new challenges to the development, evaluation, and deployment of systems utilizing lossy compression. In particular, it is unclear how different approaches to lossy compression will affect the performance of downstream machine perception tasks. To address this under-explored area, we evaluate various perception models

including image classification, image segmentation, speech recognition, and music source separation

under severe lossy compression. We utilize several popular codecs spanning conventional, neural, and generative compression architectures. Our results indicate three key findings: (1) using generative compression, it is feasible to leverage highly compressed data while incurring a negligible impact on machine perceptual quality; (2) machine perceptual quality correlates strongly with deep similarity metrics, indicating a crucial role of these metrics in the development of machine-oriented codecs; and (3) using lossy compressed datasets, (e.g. ImageNet) for pre-training can lead to counter-intuitive scenarios where lossy compression increases machine perceptual quality rather than degrading it. To encourage engagement on this growing area of research, our code and experiments are available at: https://github.com/danjacobellis/MPQ.

Paper Structure (7 sections, 2 figures, 4 tables)

This paper contains 7 sections, 2 figures, 4 tables.

Models and datasets.
Compression methods
Evaluation metrics.
Generative compression preserves machine perceptual quality.
Correlation of machine perceptual quality with deep similarity metrics.
Pretraining on lossy datasets.
Limitations and Future Directions.

Figures (2)

Figure 1: Visual comparison of image compression methods. The original ImageNet image is JPEG compressed at near-lossless quality level of 96 (5.1 BPP), while the Chest X-ray and bean disease original images are lossless.
Figure 2: Performance on various machine perception tasks when using different types of lossy compression.

Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models

TL;DR

Abstract

Machine Perceptual Quality: Evaluating the Impact of Severe Lossy Compression on Audio and Image Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)