Table of Contents
Fetching ...

Predicting Satisfied User and Machine Ratio for Compressed Images: A Unified Approach

Qi Zhang, Shanshe Wang, Xinfeng Zhang, Siwei Ma, Jingshan Pan, Wen Gao

TL;DR

This work addresses the problem of predicting perceptual quality for both humans and machines in compressed images by jointly estimating SUR and SMR. It introduces a unified deep model built on a CAFormer backbone, Difference Feature Residual Learning (DFRL), Multi-Head Attention Aggregation and Pooling (MHAAP), and an MLP-Mixer, trained via a proxy SUR task built from 14 FR-IQA models on a COCO-SMR-derived dataset. The authors pre-train the feature extractor to learn human- and machine-perception cues and then fine-tune on SUR/SMR prediction, achieving state-of-the-art MAE across multiple datasets and showing that joint learning improves both SUR and SMR. This approach provides a practical framework for optimizing compression schemes to satisfy both human observers and machine vision systems, with implications for cross-domain perceptual quality assessment.

Abstract

Nowadays, high-quality images are pursued by both humans for better viewing experience and by machines for more accurate visual analysis. However, images are usually compressed before being consumed, decreasing their quality. It is meaningful to predict the perceptual quality of compressed images for both humans and machines, which guides the optimization for compression. In this paper, we propose a unified approach to address this. Specifically, we create a deep learning-based model to predict Satisfied User Ratio (SUR) and Satisfied Machine Ratio (SMR) of compressed images simultaneously. We first pre-train a feature extractor network on a large-scale SMR-annotated dataset with human perception-related quality labels generated by diverse image quality models, which simulates the acquisition of SUR labels. Then, we propose an MLP-Mixer-based network to predict SUR and SMR by leveraging and fusing the extracted multi-layer features. We introduce a Difference Feature Residual Learning (DFRL) module to learn more discriminative difference features. We further use a Multi-Head Attention Aggregation and Pooling (MHAAP) layer to aggregate difference features and reduce their redundancy. Experimental results indicate that the proposed model significantly outperforms state-of-the-art SUR and SMR prediction methods. Moreover, our joint learning scheme of human and machine perceptual quality prediction tasks is effective at improving the performance of both.

Predicting Satisfied User and Machine Ratio for Compressed Images: A Unified Approach

TL;DR

This work addresses the problem of predicting perceptual quality for both humans and machines in compressed images by jointly estimating SUR and SMR. It introduces a unified deep model built on a CAFormer backbone, Difference Feature Residual Learning (DFRL), Multi-Head Attention Aggregation and Pooling (MHAAP), and an MLP-Mixer, trained via a proxy SUR task built from 14 FR-IQA models on a COCO-SMR-derived dataset. The authors pre-train the feature extractor to learn human- and machine-perception cues and then fine-tune on SUR/SMR prediction, achieving state-of-the-art MAE across multiple datasets and showing that joint learning improves both SUR and SMR. This approach provides a practical framework for optimizing compression schemes to satisfy both human observers and machine vision systems, with implications for cross-domain perceptual quality assessment.

Abstract

Nowadays, high-quality images are pursued by both humans for better viewing experience and by machines for more accurate visual analysis. However, images are usually compressed before being consumed, decreasing their quality. It is meaningful to predict the perceptual quality of compressed images for both humans and machines, which guides the optimization for compression. In this paper, we propose a unified approach to address this. Specifically, we create a deep learning-based model to predict Satisfied User Ratio (SUR) and Satisfied Machine Ratio (SMR) of compressed images simultaneously. We first pre-train a feature extractor network on a large-scale SMR-annotated dataset with human perception-related quality labels generated by diverse image quality models, which simulates the acquisition of SUR labels. Then, we propose an MLP-Mixer-based network to predict SUR and SMR by leveraging and fusing the extracted multi-layer features. We introduce a Difference Feature Residual Learning (DFRL) module to learn more discriminative difference features. We further use a Multi-Head Attention Aggregation and Pooling (MHAAP) layer to aggregate difference features and reduce their redundancy. Experimental results indicate that the proposed model significantly outperforms state-of-the-art SUR and SMR prediction methods. Moreover, our joint learning scheme of human and machine perceptual quality prediction tasks is effective at improving the performance of both.

Paper Structure

This paper contains 10 sections, 7 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: SUR and SMR distributions of two sample images in the KonJND-1k dataset.
  • Figure 2: The proposed SUR and SMR prediction network. We first extract multi-layer original features $F_0$ and compressed features $F_{q_k}$ by a CAFormer, then the initial difference features $F_\Delta$ are input to the Difference Feature Residual Learning (DFRL) module to generate more discriminative difference features $F^{*}_\Delta$. Subsequently, $F^{*}_\Delta$ are aggregated and pooled by the Multi-Head Attention Aggregation and Pooling (MHAAP) layer to obtain $F^{*,attn}_\Delta$. Finally, $F^{*,attn}_\Delta$ is concatenated with a regression token $T_\text{reg}$,fused by 4 MLP-Mixer layers, and fed into another 3-layer MLP to predict SUR and SMR.