Table of Contents
Fetching ...

Subjective and Objective Quality Assessment Methods of Stereoscopic Videos with Visibility Affecting Distortions

Sria Biswas, Balasubramanyam Appina, Priyanka Kokil, Sumohana S Channappayya

TL;DR

This work addresses quality assessment of stereoscopic 3D videos under visibility-distorting conditions such as fog and haze by introducing the VAD stereo dataset (12 pristine references and 360 distorted stimuli) and an unsupervised completely blind NR QA model, CBSE. The CBSE method fuses binocular views into cyclopean frames, analyzes NSS via a multi-scale spherical steerable pyramid, fits UGGD parameters, and uses MVG modeling with Bhattacharyya distances to produce a final quality score through $CBSE = S_\mu \times S_{\sum}$. The dataset is evaluated with 24 human observers, producing DMOS that validate the subjective study, while CBSE is compared against a wide range of 2D/3D FR and NR IQA/VQA baselines across IRCCYN, LFOVIA Ph1/Ph2, and VAD datasets, showing competitive or superior performance without training. The work advances S3D QoE research by providing a richly annotated VAD dataset and a robust, unsupervised NR metric with potential extensions to VR/AR content.

Abstract

We present two major contributions in this work: 1) we create a full HD resolution stereoscopic (S3D) video dataset comprised of 12 reference and 360 distorted videos. The test stimuli are produced by simulating the five levels of fog and haze ambiances on the pristine left and right video sequences. We perform subjective analysis on the created video dataset with 24 viewers and compute Difference Mean Opinion Scores (DMOS) as quality representative of the dataset, 2) an Opinion Unaware (OU) and Distortion Unaware (DU) video quality assessment model is developed for S3D videos. We construct cyclopean frames from the individual views of an S3D video and partition them into nonoverlapping blocks. We analyze the Natural Scene Statistics (NSS) of all patches of pristine and test videos, and empirically model the NSS features with Univariate Generalized Gaussian Distribution (UGGD). We compute UGGD model parameters (α, \b{eta}) at multiple spatial scales and multiple orientations of spherical steerable pyramid decomposition and show that the UGGD parameters are distortion discriminable. Further, we perform Multivariate Gaussian (MVG) modeling on the pristine and distorted video feature sets and compute the corresponding mean vectors and covariance matrices of MVG fits. We compute the Bhattacharyya distance measure between mean vectors and covariance matrices to estimate the perceptual deviation of a test video from pristine video set. Finally, we pool both distance measures to estimate the overall quality score of an S3D video. The performance of the proposed objective algorithm is verified on the popular S3D video datasets such as IRCCYN, LFOVIAS3DPh1, LFOVIAS3DPh2 and the proposed VAD stereo dataset. The algorithm delivers consistent performance across all datasets and shows competitive performance against off-the-shelf 2D and 3D image and video quality assessment algorithms.

Subjective and Objective Quality Assessment Methods of Stereoscopic Videos with Visibility Affecting Distortions

TL;DR

This work addresses quality assessment of stereoscopic 3D videos under visibility-distorting conditions such as fog and haze by introducing the VAD stereo dataset (12 pristine references and 360 distorted stimuli) and an unsupervised completely blind NR QA model, CBSE. The CBSE method fuses binocular views into cyclopean frames, analyzes NSS via a multi-scale spherical steerable pyramid, fits UGGD parameters, and uses MVG modeling with Bhattacharyya distances to produce a final quality score through . The dataset is evaluated with 24 human observers, producing DMOS that validate the subjective study, while CBSE is compared against a wide range of 2D/3D FR and NR IQA/VQA baselines across IRCCYN, LFOVIA Ph1/Ph2, and VAD datasets, showing competitive or superior performance without training. The work advances S3D QoE research by providing a richly annotated VAD dataset and a robust, unsupervised NR metric with potential extensions to VR/AR content.

Abstract

We present two major contributions in this work: 1) we create a full HD resolution stereoscopic (S3D) video dataset comprised of 12 reference and 360 distorted videos. The test stimuli are produced by simulating the five levels of fog and haze ambiances on the pristine left and right video sequences. We perform subjective analysis on the created video dataset with 24 viewers and compute Difference Mean Opinion Scores (DMOS) as quality representative of the dataset, 2) an Opinion Unaware (OU) and Distortion Unaware (DU) video quality assessment model is developed for S3D videos. We construct cyclopean frames from the individual views of an S3D video and partition them into nonoverlapping blocks. We analyze the Natural Scene Statistics (NSS) of all patches of pristine and test videos, and empirically model the NSS features with Univariate Generalized Gaussian Distribution (UGGD). We compute UGGD model parameters (α, \b{eta}) at multiple spatial scales and multiple orientations of spherical steerable pyramid decomposition and show that the UGGD parameters are distortion discriminable. Further, we perform Multivariate Gaussian (MVG) modeling on the pristine and distorted video feature sets and compute the corresponding mean vectors and covariance matrices of MVG fits. We compute the Bhattacharyya distance measure between mean vectors and covariance matrices to estimate the perceptual deviation of a test video from pristine video set. Finally, we pool both distance measures to estimate the overall quality score of an S3D video. The performance of the proposed objective algorithm is verified on the popular S3D video datasets such as IRCCYN, LFOVIAS3DPh1, LFOVIAS3DPh2 and the proposed VAD stereo dataset. The algorithm delivers consistent performance across all datasets and shows competitive performance against off-the-shelf 2D and 3D image and video quality assessment algorithms.

Paper Structure

This paper contains 24 sections, 15 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Illustration of the $100^{th}$ frame from the left view of pristine S3D video.
  • Figure 2: The variation of Spatial Index (SI) and Temporal Index (TI) scores of reference videos, and disparity SI and TI (DSI and DTI) scores of the corresponding pristine S3D videos.
  • Figure 3: Illustration of $100^{th}$ frame from the left view of fog and haze distorted videos of Level 1 to Level 5. Fog Aware Density Evaluator (FADE) represents the visibility score of a scene.
  • Figure 4: Illustration of DMOS scores variation.
  • Figure 5: Illustration of log-histograms of pristine and corresponding symmetrically distorted versions of fog and haze ambiance videos computed at first scale, $\theta = 0 ^{\circ}$ and $\Phi = (-90 ^{\circ}, 0 ^{\circ}, 90 ^{\circ}$) orientations.
  • ...and 3 more figures