Table of Contents
Fetching ...

On Optimizing Image Codecs for VMAF NEG: Analysis, Issues, and a Robust Loss Proposal

Florian Fingscheidt, Alexander Karabutov, Panqi Jia, Elena Alshina, JÖrn Ostermann

TL;DR

This paper identifies and analyzes the still existing vulnerability of VMAF NEG towards attacks, and proposes a robust loss including VMAF NEG for fine-tuning either the encoder or the decoder for image codec fine-tuning.

Abstract

The VMAF (video multi-method assessment fusion) metric for image and video coding recently gained more and more popularity as it is supposed to have a high correlation with human perception. This makes training and particularly fine-tuning of machine-learned codecs on this metric interesting. However, VMAF is shown to be attackable in a way that, e.g., unsharpening an image can lead to a gain in VMAF quality while decreasing the quality in human perception. A particular version of VMAF called VMAF NEG has been designed to be more robust against such attacks and therefore it should be more useful for fine-tuning of codecs. In this paper, our contributions are threefold. First, we identify and analyze the still existing vulnerability of VMAF NEG towards attacks, particulary towards the attack that consists in employing VMAF NEG for image codec fine-tuning. Second, to benefit from VMAF NEG's high correlation with human perception, we propose a robust loss including VMAF NEG for fine-tuning either the encoder or the decoder. Third, we support our quantitative objective results by providing perceptive impressions of some image examples.

On Optimizing Image Codecs for VMAF NEG: Analysis, Issues, and a Robust Loss Proposal

TL;DR

This paper identifies and analyzes the still existing vulnerability of VMAF NEG towards attacks, and proposes a robust loss including VMAF NEG for fine-tuning either the encoder or the decoder for image codec fine-tuning.

Abstract

The VMAF (video multi-method assessment fusion) metric for image and video coding recently gained more and more popularity as it is supposed to have a high correlation with human perception. This makes training and particularly fine-tuning of machine-learned codecs on this metric interesting. However, VMAF is shown to be attackable in a way that, e.g., unsharpening an image can lead to a gain in VMAF quality while decreasing the quality in human perception. A particular version of VMAF called VMAF NEG has been designed to be more robust against such attacks and therefore it should be more useful for fine-tuning of codecs. In this paper, our contributions are threefold. First, we identify and analyze the still existing vulnerability of VMAF NEG towards attacks, particulary towards the attack that consists in employing VMAF NEG for image codec fine-tuning. Second, to benefit from VMAF NEG's high correlation with human perception, we propose a robust loss including VMAF NEG for fine-tuning either the encoder or the decoder. Third, we support our quantitative objective results by providing perceptive impressions of some image examples.
Paper Structure (10 sections, 1 equation, 2 figures, 3 tables)

This paper contains 10 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Samples from fine-tuning decoders using only VMAF NEG as distortion loss ($\alpha=\beta=0, \gamma=1$).
  • Figure 2: Patches from the fine-tuned encoders (top) and fine-tuned decoders (bottom). For better quality refers to the digital version. The white asterisk ($\ast$) marks the beset hyperparameter setting for encoder fine-tuning (top) or decoder fine-tuning (bottom), respectively. For better quality inspection, the reader is referred to the digital version on screen.