Table of Contents
Fetching ...

Priorformer: A UGC-VQA Method with content and distortion priors

Yajing Pei, Shiyu Huang, Yiting Lu, Xin Li, Zhibo Chen

TL;DR

A novel prior-augmented perceptual vision transformer (PriorFormer) for the BVQA of UGC, which boots its adaptability and representation capability for divergent contents and distortions and achieves state-of-the-art performance on three public UGC VQA datasets.

Abstract

User Generated Content (UGC) videos are susceptible to complicated and variant degradations and contents, which prevents the existing blind video quality assessment (BVQA) models from good performance since the lack of the adapability of distortions and contents. To mitigate this, we propose a novel prior-augmented perceptual vision transformer (PriorFormer) for the BVQA of UGC, which boots its adaptability and representation capability for divergent contents and distortions. Concretely, we introduce two powerful priors, i.e., the content and distortion priors, by extracting the content and distortion embeddings from two pre-trained feature extractors. Then we adopt these two powerful embeddings as the adaptive prior tokens, which are transferred to the vision transformer backbone jointly with implicit quality features. Based on the above strategy, the proposed PriorFormer achieves state-of-the-art performance on three public UGC VQA datasets including KoNViD-1K, LIVE-VQC and YouTube-UGC.

Priorformer: A UGC-VQA Method with content and distortion priors

TL;DR

A novel prior-augmented perceptual vision transformer (PriorFormer) for the BVQA of UGC, which boots its adaptability and representation capability for divergent contents and distortions and achieves state-of-the-art performance on three public UGC VQA datasets.

Abstract

User Generated Content (UGC) videos are susceptible to complicated and variant degradations and contents, which prevents the existing blind video quality assessment (BVQA) models from good performance since the lack of the adapability of distortions and contents. To mitigate this, we propose a novel prior-augmented perceptual vision transformer (PriorFormer) for the BVQA of UGC, which boots its adaptability and representation capability for divergent contents and distortions. Concretely, we introduce two powerful priors, i.e., the content and distortion priors, by extracting the content and distortion embeddings from two pre-trained feature extractors. Then we adopt these two powerful embeddings as the adaptive prior tokens, which are transferred to the vision transformer backbone jointly with implicit quality features. Based on the above strategy, the proposed PriorFormer achieves state-of-the-art performance on three public UGC VQA datasets including KoNViD-1K, LIVE-VQC and YouTube-UGC.

Paper Structure

This paper contains 14 sections, 4 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The MOS distribution of different content.
  • Figure 2: The MOS distribution of different distortion type.
  • Figure 3: The overall framework of the proposed method. Frame features are extracted by an online feature extractor, and content and distortion prior features are extracted by fixed pretrained content and distortion prior feature extractors respectively. The extracted features are projected accordingly and inputted into the Transformer encoder to extract spatial features of the frames. Then, the GRU network and subjectively inspired temporal pooling layer(i.e., Minimal Pooling, MP and Softmin-Weighted Average Pooling, SWAP) are utilized to model temporal-memory effects.