Exploring Simple Siamese Network for High-Resolution Video Quality Assessment

Guotao Shen; Ziheng Yan; Xin Jin; Longhai Wu; Jie Chen; Ilhyun Cho; Cheul-Hee Hahm

Exploring Simple Siamese Network for High-Resolution Video Quality Assessment

Guotao Shen, Ziheng Yan, Xin Jin, Longhai Wu, Jie Chen, Ilhyun Cho, Cheul-Hee Hahm

TL;DR

This paper tackles high-resolution video quality assessment by arguing that technical quality must be interpreted in a semantic context. It proposes SiamVQA, a lightweight Siamese architecture that shares weights between technical and aesthetic branches and employs a dual cross-attention fusion to produce per-pixel quality maps that are then pooled for a final score. The model achieves state-of-the-art results on high-resolution benchmarks such as LSVQ$_{1080p}$, LIVE-Qualcomm, and YouTube-UGC, while remaining competitive on lower-resolution data, and does so with fewer parameters and faster runtime than several prior two-branch approaches. Overall, SiamVQA demonstrates that semantic-aware technical perception and effective multimodal fusion can significantly improve VQA performance without heavy model complexity.

Abstract

In the research of video quality assessment (VQA), two-branch network has emerged as a promising solution. It decouples VQA with separate technical and aesthetic branches to measure the perception of low-level distortions and high-level semantics respectively. However, we argue that while technical and aesthetic perspectives are complementary, the technical perspective itself should be measured in semantic-aware manner. We hypothesize that existing technical branch struggles to perceive the semantics of high-resolution videos, as it is trained on local mini-patches sampled from videos. This issue can be hidden by apparently good results on low-resolution videos, but indeed becomes critical for high-resolution VQA. This work introduces SiamVQA, a simple but effective Siamese network for highre-solution VQA. SiamVQA shares weights between technical and aesthetic branches, enhancing the semantic perception ability of technical branch to facilitate technical-quality representation learning. Furthermore, it integrates a dual cross-attention layer for fusing technical and aesthetic features. SiamVQA achieves state-of-the-art accuracy on high-resolution benchmarks, and competitive results on lower-resolution benchmarks. Codes will be available at: https://github.com/srcn-ivl/SiamVQA

Exploring Simple Siamese Network for High-Resolution Video Quality Assessment

TL;DR

Abstract

Exploring Simple Siamese Network for High-Resolution Video Quality Assessment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)