Video Quality Assessment with Texture Information Fusion for Streaming Applications
Vignesh V Menon, Prajit T Rajendran, Reza Farahani, Klaus Schoeffmann, Christian Timmerer
TL;DR
The paper addresses the need for fast, perceptually aligned video quality assessment in streaming by proposing VQ-TIF, a reduced-reference VQA that fuses DCT-energy-based texture features with SSIM through an LSTM to estimate VMAF. It uses $E_Y$, $h$, and $L_Y$ texture features extracted from luma and combines them with SSIM per frame to produce per-chunk estimates that are averaged for segment quality. On UHD content, VQ-TIF achieves $PCC=0.96$ and $MAE=2.71$ relative to ground-truth VMAF, while delivering a $9.14\times$ speed-up and a $89.44\%$ reduction in energy consumption. Trained and evaluated on the Inter4K UHD dataset with SDR content, the method demonstrates potential for real-time VQA in streaming and can be extended to HDR and higher resolutions in future work.
Abstract
The rise in video streaming applications has increased the demand for video quality assessment (VQA). In 2016, Netflix introduced Video Multi-Method Assessment Fusion (VMAF), a full reference VQA metric that strongly correlates with perceptual quality, but its computation is time-intensive. We propose a Discrete Cosine Transform (DCT)-energy-based VQA with texture information fusion (VQ-TIF) model for video streaming applications that determines the visual quality of the reconstructed video compared to the original video. VQ-TIF extracts Structural Similarity (SSIM) and spatiotemporal features of the frames from the original and reconstructed videos and fuses them using a long short-term memory (LSTM)-based model to estimate the visual quality. Experimental results show that VQ-TIF estimates the visual quality with a Pearson Correlation Coefficient (PCC) of 0.96 and a Mean Absolute Error (MAE) of 2.71, on average, compared to the ground truth VMAF scores. Additionally, VQ-TIF estimates the visual quality at a rate of 9.14 times faster than the state-of-the-art VMAF implementation, along with an 89.44 % reduction in energy consumption, assuming an Ultra HD (2160p) display resolution.
