Video Quality Enhancement Using Deep Learning-Based Prediction Models for Quantized DCT Coefficients in MPEG I-frames
Antonio J G Busson, Paulo R C Mendes, Daniel de S Moraes, Álvaro M da Veiga, Álan L V Guedes, Sérgio Colcher
TL;DR
The paper tackles quality degradation in MPEG I-frames caused by lossy quantization. It introduces a frequency-domain deep learning approach that predicts missing DCT coefficients directly from low-quality quantized data, followed by standard inverse quantization and reconstruction to yield higher-quality I-frames. Among several backbones, Res-UNet delivers the best performance, achieving substantial SSIM and PSNR gains on validation sets and demonstrating potential to reach near $QF$=20–50 quality without re-encoding. This decoder-side method can reduce bandwidth and storage needs while preserving MPEG compatibility, with future work exploring attention mechanisms, transformers, and extensions to broader MPEG standards and frame types.
Abstract
Recent works have successfully applied some types of Convolutional Neural Networks (CNNs) to reduce the noticeable distortion resulting from the lossy JPEG/MPEG compression technique. Most of them are built upon the processing made on the spatial domain. In this work, we propose a MPEG video decoder that is purely based on the frequency-to-frequency domain: it reads the quantized DCT coefficients received from a low-quality I-frames bitstream and, using a deep learning-based model, predicts the missing coefficients in order to recompose the same frames with enhanced quality. In experiments with a video dataset, our best model was able to improve from frames with quantized DCT coefficients corresponding to a Quality Factor (QF) of 10 to enhanced quality frames with QF slightly near to 20.
