Enhancing image quality prediction with self-supervised visual masking
Uğur Çoğalan, Mojtaba Bemana, Hans-Peter Seidel, Karol Myszkowski
TL;DR
This work tackles the misalignment between full-reference image quality metrics (FR-IQMs) and human perception by introducing a self-supervised visual masking mechanism. A lightweight CNN learns a per-pixel mask $M$ from a reference $X$ and distorted $Y$, which modulates inputs before applying an FR-IQM $\mathcal{D}$, with a small mapping network $\mathcal{G}$ aligning the metric output to MOS $q$. The approach improves both classical and deep-feature IQMs across CSIQ, TID2013, and PIPAL, producing perceptually faithful error maps and enabling loss-based improvements in denoising and deblurring tasks. The method is lightweight, agnostic to the underlying metric, and shows strong potential for practical deployment in restoration and compression workflows.
Abstract
Full-reference image quality metrics (FR-IQMs) aim to measure the visual differences between a pair of reference and distorted images, with the goal of accurately predicting human judgments. However, existing FR-IQMs, including traditional ones like PSNR and SSIM and even perceptual ones such as HDR-VDP, LPIPS, and DISTS, still fall short in capturing the complexities and nuances of human perception. In this work, rather than devising a novel IQM model, we seek to improve upon the perceptual quality of existing FR-IQM methods. We achieve this by considering visual masking, an important characteristic of the human visual system that changes its sensitivity to distortions as a function of local image content. Specifically, for a given FR-IQM metric, we propose to predict a visual masking model that modulates reference and distorted images in a way that penalizes the visual errors based on their visibility. Since the ground truth visual masks are difficult to obtain, we demonstrate how they can be derived in a self-supervised manner solely based on mean opinion scores (MOS) collected from an FR-IQM dataset. Our approach results in enhanced FR-IQM metrics that are more in line with human prediction both visually and quantitatively.
