Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment

Yongxu Liu; Yinghui Quan; Guoyao Xiao; Aobo Li; Jinjian Wu

Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment

Yongxu Liu, Yinghui Quan, Guoyao Xiao, Aobo Li, Jinjian Wu

TL;DR

The paper addresses the challenge of capturing both local details and global semantics in image and video quality assessment without increasing model complexity. It introduces Scaling and Masking (SAMA), a data-sampling paradigm that builds a multi-granularity pyramid, samples fragments from each scale, and applies a scale-aware mask to produce a fixed-size input for a single-branch transformer-based model. Across IQA and VQA benchmarks, SAMA significantly improves baseline single-branch methods and achieves competitive performance with multi-branch approaches, all with comparable computational cost. The work demonstrates that careful data sampling and masking can realize multi-scale perception with minimal architectural changes, and it explores variants of relative scale encoding with generally modest gains. This has practical impact for scalable, high-performance quality assessment in real-world, high-resolution content.

Abstract

Quality assessment of images and videos emphasizes both local details and global semantics, whereas general data sampling methods (e.g., resizing, cropping or grid-based fragment) fail to catch them simultaneously. To address the deficiency, current approaches have to adopt multi-branch models and take as input the multi-resolution data, which burdens the model complexity. In this work, instead of stacking up models, a more elegant data sampling method (named as SAMA, scaling and masking) is explored, which compacts both the local and global content in a regular input size. The basic idea is to scale the data into a pyramid first, and reduce the pyramid into a regular data dimension with a masking strategy. Benefiting from the spatial and temporal redundancy in images and videos, the processed data maintains the multi-scale characteristics with a regular input size, thus can be processed by a single-branch model. We verify the sampling method in image and video quality assessment. Experiments show that our sampling method can improve the performance of current single-branch models significantly, and achieves competitive performance to the multi-branch models without extra model complexity. The source code will be available at https://github.com/Sissuire/SAMA.

Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment

TL;DR

Abstract

Paper Structure (26 sections, 7 equations, 7 figures, 4 tables)

This paper contains 26 sections, 7 equations, 7 figures, 4 tables.

Introduction
Related Work
Blind IQA
Blind VQA
Proposed Method
Overall Architecture
Grid-based Fragment Sampling
Scaling and Masking in Image
SAMA for Video
Model Architecture and Implementation Details
Relative Scale Encoding
SAMA-W
SAMA-SE
SAMA-RSB-A
SAMA-RSB-M
...and 11 more sections

Figures (7)

Figure 1: An illustration of data sampling methods in quality assessment. Scaling would cause detail loss, while cropping might harm global perception. The proposed method scales the data into a pyramid and masks the pyramid based on data redundancy. The resulting data holds the multi-scale nature with a regular input size.
Figure 2: The workflow of SAMA. Image or video data is first scaled into a multi-granularity pyramid via interpolation. Then fragments are sampled in each scale. Afterwards, spatial/temporal masking is constructed to tune the hierarchical fragments into a regular sampling size. The data after SAMA is fed into a base model for quality estimation.
Figure 3: The illustration of spatial and temporal masks. (a) and (b) are spatial masks for images, and the last three are temporal masks for videos. Different intensities indicate the different scales.
Figure 4: An example of sampling result for image.
Figure 5: The illustration of training and testing on LSVQ
...and 2 more figures

Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment

TL;DR

Abstract

Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (7)