ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

Xinyi Wang; Angeliki Katsenou; David Bull

ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

Xinyi Wang, Angeliki Katsenou, David Bull

TL;DR

ReLaX-VQA tackles the challenge of No-Reference Video Quality Assessment for diverse User-Generated Content by combining selective spatio-temporal fragment sampling with layer-stacked deep features from ResNet-50 and ViT. The framework comprises three modules: Spatio-Temporal Fragment Sampling to extract salient RFs/MF from frame differences and optical flow, DNN Feature Extraction with multi-layer fusion, and a lightweight MLP regressor trained with a composite MAE and Rank loss. Empirically, it achieves state-of-the-art or competitive performance across four NR-VQA benchmarks and the large-scale LSVQ, especially when fine-tuned, with strong generalization across resolutions. The work demonstrates that focusing on high-variability spatio-temporal regions and combining local/global feature representations yields robust NR-VQA results, offering open-source code and pretrained models for broader adoption.

Abstract

With the rapid growth of User-Generated Content (UGC) exchanged between users and sharing platforms, the need for video quality assessment in the wild is increasingly evident. UGC is typically acquired using consumer devices and undergoes multiple rounds of compression (transcoding) before reaching the end user. Therefore, traditional quality metrics that employ the original content as a reference are not suitable. In this paper, we propose ReLaX-VQA, a novel No-Reference Video Quality Assessment (NR-VQA) model that aims to address the challenges of evaluating the quality of diverse video content without reference to the original uncompressed videos. ReLaX-VQA uses frame differences to select spatio-temporal fragments intelligently together with different expressions of spatial features associated with the sampled frames. These are then used to better capture spatial and temporal variabilities in the quality of neighbouring frames. Furthermore, the model enhances abstraction by employing layer-stacking techniques in deep neural network features from Residual Networks and Vision Transformers. Extensive testing across four UGC datasets demonstrates that ReLaX-VQA consistently outperforms existing NR-VQA methods, achieving an average SRCC of 0.8658 and PLCC of 0.8873. Open-source code and trained models that will facilitate further research and applications of NR-VQA can be found at https://github.com/xinyiW915/ReLaX-VQA.

ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

TL;DR

Abstract

Paper Structure (18 sections, 18 equations, 3 figures, 5 tables, 1 algorithm)

This paper contains 18 sections, 18 equations, 3 figures, 5 tables, 1 algorithm.

Introduction
Related work
UGC video datasets
NR VQA models
Advancing the State-of-the-Art
Proposed method
Spatio-Temporal Fragment Sampling module
DNN Feature Extraction module
Quality Regression module
Loss function
Experiments
Experimental setup
UGC Datasets
Evaluation method
Implementation details
...and 3 more sections

Figures (3)

Figure 1: Overview of the proposed ReLaX-VQA framework that demonstrates the different processing of the input frames for feature extraction and regression to infer the quality score. The visualisation of the intermediate representation of fragmented data is illustrated. More details on the architectures of ResNet-50 Stack (I) and ResNet-50 Pool (II) are provided in Fig. \ref{['fig: framework_2']}.
Figure 2: The bespoke architectures of ResNet-50 Stack (I), ResNet-50 Pool (II), and ViT Pool.
Figure 3: Illustration of examples sampled from the YouTube-UGC wang2019youtube dataset. Top-row: $TelevisionClip\_1080P-68c6.mkv$. Bottom-row: $Sports\_2160P-0455.mkv$. GT refers to ground truth quality scores. Shown here are the scores predicted using ReLaX-VQA (w/o FT).

ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

TL;DR

Abstract

ReLaX-VQA: Residual Fragment and Layer Stack Extraction for Enhancing Video Quality Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (3)