Subjective assessment of the impact of a content adaptive optimiser for compressing 4K HDR content with AV1
Vibhoothi, Angeliki Katsenou, François Pitié, Katarina Domijan, Anil Kokaram
TL;DR
This work investigates the subjective impact of per-clip content-adaptive λ optimization for AV1 on 4K HDR content, comparing perceptual scores with a range of objective metrics. It formulates a per-clip optimization via the RD objective $J = D + \lambda R$, where $\lambda \approx A \cdot q_{dc}^2$, and tunes clip-specific multipliers using Powell's method, reporting BD-rate improvements on HDR sequences. Through a DSCQS subjective study with 42 observers and seven 4K HDR clips, the study analyzes expert vs non-expert differences and correlates subjective scores with metrics like HDR-VDP-3 and VMAF, noting that film grain and ISO-noise influence judgments. Overall, the method yields modest perceptual gains (average MOS up by about $5.19\%$ with bitrate savings around $4.68\%$), with HDR-VDP-3 and VMAF providing the strongest correlations to subjective quality, and highlights the need for refined protocols to reduce variance in HDR subjective testing.
Abstract
Since 2015 video dimensionality has expanded to higher spatial and temporal resolutions and a wider colour gamut. This High Dynamic Range (HDR) content has gained traction in the consumer space as it delivers an enhanced quality of experience. At the same time, the complexity of codecs is growing. This has driven the development of tools for content-adaptive optimisation that achieve optimal rate-distortion performance for HDR video at 4K resolution. While improvements of just a few percentage points in BD-Rate (1-5\%) are significant for the streaming media industry, the impact on subjective quality has been less studied especially for HDR/AV1. In this paper, we conduct a subjective quality assessment (42 subjects) of 4K HDR content with a per-clip optimisation strategy. We correlate these subjective scores with existing popular objective metrics used in standard development and show that some perceptual metrics correlate surprisingly well even though they are not tuned for HDR. We find that the DSQCS protocol is too insensitive to categorically compare the methods but the data allows us to make recommendations about the use of experts vs non-experts in HDR studies, and explain the subjective impact of film grain in HDR content under compression.
