Bit Distribution Study and Implementation of Spatial Quality Map in the JPEG-AI Standardization
Panqi Jia, Jue Mao, Esin Koyuncu, A. Burakhan Koyuncu, Timofey Solovyev, Alexander Karabutov, Yin Zhao, Elena Alshina, Andre Kaup
TL;DR
This work analyzes spatial bit distribution in JPEG-AI VM and VVC intra by constructing Bit Distribution Maps (BDMs) and introduces a spatial quality index map $Q$ to enable region-wise quantization. JPEG-AI VM operates on fixed $16\times16$ latent blocks, whereas VVC intra uses variable block sizes, and the study finds VVC exhibits higher BD-M variance, but the analysis suggests gains in JPEG-AI VM are not solely due to distribution flexibility. The method multiplies the latent tensor by $Q$ and signals $Q$ via the integer index with $Q_s = 2^{\frac{Q}{4}}$ and a predictor-based $\Delta Q = Q - Q_{pred}$, enabling ROI coding and improved PSNR-Y, with gains up to about $0.45$ dB in some cases and bitrate changes up to $7.1\%$. The results indicate that leveraging VVC-derived spatial strategies can further boost JPEG-AI VM performance, though encoder complexity poses practical constraints; overall, spatial quality maps offer a viable route to narrow the gap between JPEG-AI VM and VVC.
Abstract
Currently, there is a high demand for neural network-based image compression codecs. These codecs employ non-linear transforms to create compact bit representations and facilitate faster coding speeds on devices compared to the hand-crafted transforms used in classical frameworks. The scientific and industrial communities are highly interested in these properties, leading to the standardization effort of JPEG-AI. The JPEG-AI verification model has been released and is currently under development for standardization. Utilizing neural networks, it can outperform the classic codec VVC intra by over 10% BD-rate operating at base operation point. Researchers attribute this success to the flexible bit distribution in the spatial domain, in contrast to VVC intra's anchor that is generated with a constant quality point. However, our study reveals that VVC intra displays a more adaptable bit distribution structure through the implementation of various block sizes. As a result of our observations, we have proposed a spatial bit allocation method to optimize the JPEG-AI verification model's bit distribution and enhance the visual quality. Furthermore, by applying the VVC bit distribution strategy, the objective performance of JPEG-AI verification mode can be further improved, resulting in a maximum gain of 0.45 dB in PSNR-Y.
