IQNet: Image Quality Assessment Guided Just Noticeable Difference Prefiltering For Versatile Video Coding
Yu-Han Sun, Chiang Lo-Hsuan Lee, Tian-Sheuan Chang
TL;DR
This work tackles the challenge of efficiently coding video by learning a fine-grained JND prefiltering strategy guided by no-reference image quality assessment. It introduces an IQA-guided JND dataset that embeds coding effects and perceptual enhancements, and a lightweight IQNet network that predicts JND adjustments in luminance with only 3K parameters, usable across QPs. The approach achieves substantial bitrate savings (up to 41% for all-intra and 53% for low-delay P, average 15% and 19%) with negligible perceptual loss, while delivering higher perceptual quality (VMAF) than prior CNN-based methods and requiring far smaller model size. The method offers a practical, scalable solution for VVC prefiltering in real-time HD video applications, reducing the need for extensive subjective testing.
Abstract
Image prefiltering with just noticeable distortion (JND) improves coding efficiency in a visual lossless way by filtering the perceptually redundant information prior to compression. However, real JND cannot be well modeled with inaccurate masking equations in traditional approaches or image-level subject tests in deep learning approaches. Thus, this paper proposes a fine-grained JND prefiltering dataset guided by image quality assessment for accurate block-level JND modeling. The dataset is constructed from decoded images to include coding effects and is also perceptually enhanced with block overlap and edge preservation. Furthermore, based on this dataset, we propose a lightweight JND prefiltering network, IQNet, which can be applied directly to different quantization cases with the same model and only needs 3K parameters. The experimental results show that the proposed approach to Versatile Video Coding could yield maximum/average bitrate savings of 41\%/15\% and 53\%/19\% for all-intra and low-delay P configurations, respectively, with negligible subjective quality loss. Our method demonstrates higher perceptual quality and a model size that is an order of magnitude smaller than previous deep learning methods.
