Table of Contents
Fetching ...

VMAF Re-implementation on PyTorch: Some Experimental Results

Kirill Aistov, Maxim Koroteev

TL;DR

The paper presents a PyTorch re-implementation of VMAF to enable differentiable optimization, addressing claims that VMAF is non-differentiable. It validates gradient behavior through gradient checking and demonstrates that VMAF can serve as a loss for training, including an experiment that learns a single 7×7 preprocessing filter via SGD and outperforms unsharp masking in VMAF-based quality improvements. The results show a discrepancy of $<10^{-2}$ between PyTorch VMAF and the libvmaf reference, with well-behaved gradients and practical timing for offline training and real-time filtering; the approach is validated on HEVC RD curves and Netflix data. Overall, the work enables differentiable VMAF-based optimization for video processing tasks and provides insights into the practical use of VMAF as a learning objective.

Abstract

Based on the standard VMAF implementation we propose an implementation of VMAF using PyTorch framework. For this implementation comparisons with the standard (libvmaf) show the discrepancy $\lesssim 10^{-2}$ in VMAF units. We investigate gradients computation when using VMAF as an objective function and demonstrate that training using this function does not result in ill-behaving gradients. The implementation is then used to train a preprocessing filter. It is demonstrated that its performance is superior to the unsharp masking filter. The resulting filter is also easy for implementation and can be applied in video processing tasks for video copression improvement. This is confirmed by the results of numerical experiments.

VMAF Re-implementation on PyTorch: Some Experimental Results

TL;DR

The paper presents a PyTorch re-implementation of VMAF to enable differentiable optimization, addressing claims that VMAF is non-differentiable. It validates gradient behavior through gradient checking and demonstrates that VMAF can serve as a loss for training, including an experiment that learns a single 7×7 preprocessing filter via SGD and outperforms unsharp masking in VMAF-based quality improvements. The results show a discrepancy of between PyTorch VMAF and the libvmaf reference, with well-behaved gradients and practical timing for offline training and real-time filtering; the approach is validated on HEVC RD curves and Netflix data. Overall, the work enables differentiable VMAF-based optimization for video processing tasks and provides insights into the practical use of VMAF as a learning objective.

Abstract

Based on the standard VMAF implementation we propose an implementation of VMAF using PyTorch framework. For this implementation comparisons with the standard (libvmaf) show the discrepancy in VMAF units. We investigate gradients computation when using VMAF as an objective function and demonstrate that training using this function does not result in ill-behaving gradients. The implementation is then used to train a preprocessing filter. It is demonstrated that its performance is superior to the unsharp masking filter. The resulting filter is also easy for implementation and can be applied in video processing tasks for video copression improvement. This is confirmed by the results of numerical experiments.
Paper Structure (3 sections, 17 equations, 3 figures, 1 table)

This paper contains 3 sections, 17 equations, 3 figures, 1 table.

Table of Contents

  1. VIF
  2. ADM
  3. Motion

Figures (3)

  • Figure 1: Visual comparison of unsharp masking filter (\ref{['fig:b']}) with the optimal filter (\ref{['fig:c']}) constructed as described in the main text and the reference image (\ref{['fig:a']}). The image represents a frame extracted from the publicly available Netflix data setnetflix.
  • Figure 2: VMAF vs PSNR trade-off for the optimal filter. The computations were done on frames from Netflix public dataset after applying our filter and unsharp masking filter with various values of $\alpha$ parameter (shown next to the points). Note that for $\alpha\to 0$ VMAF score converges to $\sim 97.4$ instead of $100$; this occurs when the Motion feature of VMAF is equal to zero. Both filters have the size $7\times 7$.
  • Figure 3: VMAF RD curves were obtained using a synthetic stream presenting a video game. The measurement was done on four target bitrates $4000$, $6000$, $8000$, $9500$ kbps. For the unsharp masking filter $\alpha=0.5$; for the optimal filter $\alpha=0.25$.