JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer
Mingyu Ouyang, Zhenzhong Chen
TL;DR
The paper tackles JPEG artifact restoration by directly recovering quantized DCT coefficients in the frequency domain. It introduces DCTransformer, a DCT-domain spatial-frequential Transformer with a dual-branch SFTB, quantization matrix embedding, and a luminance-chrominance alignment head, trained with a dual-domain loss to optimize both pixel- and coefficient-domain fidelity. Key contributions include the first Transformer-based approach in the DCT domain for coefficient recovery, a single model that handles luminance and chrominance across wide quality factors, and extensive experiments showing state-of-the-art or competitive results in both pixel and DCT-domain metrics, with improved robustness and efficiency. The method has practical impact for JPEG restoration tasks, enabling flexible quality-factor handling and efficient deployment on high-resolution images due to favorable memory and inference time characteristics.
Abstract
JPEG compression adopts the quantization of Discrete Cosine Transform (DCT) coefficients for effective bit-rate reduction, whilst the quantization could lead to a significant loss of important image details. Recovering compressed JPEG images in the frequency domain has recently garnered increasing interest, complementing the multitude of restoration techniques established in the pixel domain. However, existing DCT domain methods typically suffer from limited effectiveness in handling a wide range of compression quality factors or fall short in recovering sparse quantized coefficients and the components across different colorspaces. To address these challenges, we propose a DCT domain spatial-frequential Transformer, namely DCTransformer, for JPEG quantized coefficient recovery. Specifically, a dual-branch architecture is designed to capture both spatial and frequential correlations within the collocated DCT coefficients. Moreover, we incorporate the operation of quantization matrix embedding, which effectively allows our single model to handle a wide range of quality factors, and a luminance-chrominance alignment head that produces a unified feature map to align different-sized luminance and chrominance components. Our proposed DCTransformer outperforms the current state-of-the-art JPEG artifact removal techniques, as demonstrated by our extensive experiments.
