ColorVideoVDP: A visual difference predictor for image, video and display distortions
Rafal K. Mantiuk, Param Hanji, Maliha Ashraf, Yuta Asano, Alexandre Chapiro
TL;DR
ColorVideoVDP introduces a fully differentiable, full-reference metric that jointly models color and spatiotemporal vision with a display-aware pipeline. Built on castleCSF and cross-channel masking in the Derrington-Krauskopf-Lennie space, it predicts Just-Objectionable-Differences (JOD) while producing per-channel distortion visualizations. Calibrated with XR-DAVID and existing SDR/HDR datasets, it demonstrates improved prediction accuracy across diverse content and XR artifacts, and supports applications in chroma subsampling, display-tolerance specifications, and perceptual optimization. The method fills a gap in video/image quality assessment by integrating color, temporal dynamics, and display characteristics into a single, interpretable, and differentiable framework, enabling perceptually guided design and optimization in modern displays and XR systems.
Abstract
ColorVideoVDP is a video and image quality metric that models spatial and temporal aspects of vision, for both luminance and color. The metric is built on novel psychophysical models of chromatic spatiotemporal contrast sensitivity and cross-channel contrast masking. It accounts for the viewing conditions, geometric, and photometric characteristics of the display. It was trained to predict common video streaming distortions (e.g. video compression, rescaling, and transmission errors), and also 8 new distortion types related to AR/VR displays (e.g. light source and waveguide non-uniformities). To address the latter application, we collected our novel XR-Display-Artifact-Video quality dataset (XR-DAVID), comprised of 336 distorted videos. Extensive testing on XR-DAVID, as well as several datasets from the literature, indicate a significant gain in prediction performance compared to existing metrics. ColorVideoVDP opens the doors to many novel applications which require the joint automated spatiotemporal assessment of luminance and color distortions, including video streaming, display specification and design, visual comparison of results, and perceptually-guided quality optimization.
