Table of Contents
Fetching ...

An Efficient Quality Metric for Video Frame Interpolation Based on Motion-Field Divergence

Conall Daly, Darren Ramsook, Anil Kokaram

TL;DR

The paper tackles the challenge of assessing video frame interpolation quality with temporal coherence. It introduces PSNR_DIV, a full-reference metric that weights mean-squared error by motion-field divergence to emphasize temporally inconsistent regions, drawing from archival film restoration techniques. On the BVI-VFI dataset, PSNR_DIV matches or exceeds FloLPIPS in correlation to human scores while being 2.5× faster and using 4× less memory, and it remains robust to different motion estimators. The approach enables fast quality evaluation and practical use as a training loss for VFI models, with code available at www.github.com/conalld/psnr-div.

Abstract

Video frame interpolation is a fundamental tool for temporal video enhancement, but existing quality metrics struggle to evaluate the perceptual impact of interpolation artefacts effectively. Metrics like PSNR, SSIM and LPIPS ignore temporal coherence. State-of-the-art quality metrics tailored towards video frame interpolation, like FloLPIPS, have been developed but suffer from computational inefficiency that limits their practical application. We present $\text{PSNR}_{\text{DIV}}$, a novel full-reference quality metric that enhances PSNR through motion divergence weighting, a technique adapted from archival film restoration where it was developed to detect temporal inconsistencies. Our approach highlights singularities in motion fields which is then used to weight image errors. Evaluation on the BVI-VFI dataset (180 sequences across multiple frame rates, resolutions and interpolation methods) shows $\text{PSNR}_{\text{DIV}}$ achieves statistically significant improvements: +0.09 Pearson Linear Correlation Coefficient over FloLPIPS, while being 2.5$\times$ faster and using 4$\times$ less memory. Performance remains consistent across all content categories and are robust to the motion estimator used. The efficiency and accuracy of $\text{PSNR}_{\text{DIV}}$ enables fast quality evaluation and practical use as a loss function for training neural networks for video frame interpolation tasks. An implementation of our metric is available at www.github.com/conalld/psnr-div.

An Efficient Quality Metric for Video Frame Interpolation Based on Motion-Field Divergence

TL;DR

The paper tackles the challenge of assessing video frame interpolation quality with temporal coherence. It introduces PSNR_DIV, a full-reference metric that weights mean-squared error by motion-field divergence to emphasize temporally inconsistent regions, drawing from archival film restoration techniques. On the BVI-VFI dataset, PSNR_DIV matches or exceeds FloLPIPS in correlation to human scores while being 2.5× faster and using 4× less memory, and it remains robust to different motion estimators. The approach enables fast quality evaluation and practical use as a training loss for VFI models, with code available at www.github.com/conalld/psnr-div.

Abstract

Video frame interpolation is a fundamental tool for temporal video enhancement, but existing quality metrics struggle to evaluate the perceptual impact of interpolation artefacts effectively. Metrics like PSNR, SSIM and LPIPS ignore temporal coherence. State-of-the-art quality metrics tailored towards video frame interpolation, like FloLPIPS, have been developed but suffer from computational inefficiency that limits their practical application. We present , a novel full-reference quality metric that enhances PSNR through motion divergence weighting, a technique adapted from archival film restoration where it was developed to detect temporal inconsistencies. Our approach highlights singularities in motion fields which is then used to weight image errors. Evaluation on the BVI-VFI dataset (180 sequences across multiple frame rates, resolutions and interpolation methods) shows achieves statistically significant improvements: +0.09 Pearson Linear Correlation Coefficient over FloLPIPS, while being 2.5 faster and using 4 less memory. Performance remains consistent across all content categories and are robust to the motion estimator used. The efficiency and accuracy of enables fast quality evaluation and practical use as a loss function for training neural networks for video frame interpolation tasks. An implementation of our metric is available at www.github.com/conalld/psnr-div.

Paper Structure

This paper contains 14 sections, 11 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Flow diagram for our proposed metric. Inputs and outputs are annotated above flow arrows. Operations are denoted by blue boxes. Our metric uses the divergence of the motion field for weighting the mean squared error between pixel values of the interpolated and reference sequences.
  • Figure 2: Frames from two reference sequences in the BVI-VFI dataset are shown in (a-b). The corresponding interpolated frame generated by DVF (c-d) and ST-MFNet (e-f) are overlaid with values of our divergence measure $\mathbf{d}(\mathbf{x})$. Regions where $\mathbf{d}(\mathbf{x})\geq 0.01$ are shown in yellow and regions where $\mathbf{d}(\mathbf{x})=0$ are shown in blue. Crops are shown inset outlined in red. This demonstrates the effectiveness of divergence as a metric for detecting regions of temporal inconsistency.
  • Figure 3: Plots of variation in (a) PLCC and (b) SRCC with respect to motion divergence threshold (T) for different motion estimators. Values of the threshold for which correlation is maximised are marked by a yellow square.