JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer

Mingyu Ouyang; Zhenzhong Chen

JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer

Mingyu Ouyang, Zhenzhong Chen

TL;DR

The paper tackles JPEG artifact restoration by directly recovering quantized DCT coefficients in the frequency domain. It introduces DCTransformer, a DCT-domain spatial-frequential Transformer with a dual-branch SFTB, quantization matrix embedding, and a luminance-chrominance alignment head, trained with a dual-domain loss to optimize both pixel- and coefficient-domain fidelity. Key contributions include the first Transformer-based approach in the DCT domain for coefficient recovery, a single model that handles luminance and chrominance across wide quality factors, and extensive experiments showing state-of-the-art or competitive results in both pixel and DCT-domain metrics, with improved robustness and efficiency. The method has practical impact for JPEG restoration tasks, enabling flexible quality-factor handling and efficient deployment on high-resolution images due to favorable memory and inference time characteristics.

Abstract

JPEG compression adopts the quantization of Discrete Cosine Transform (DCT) coefficients for effective bit-rate reduction, whilst the quantization could lead to a significant loss of important image details. Recovering compressed JPEG images in the frequency domain has recently garnered increasing interest, complementing the multitude of restoration techniques established in the pixel domain. However, existing DCT domain methods typically suffer from limited effectiveness in handling a wide range of compression quality factors or fall short in recovering sparse quantized coefficients and the components across different colorspaces. To address these challenges, we propose a DCT domain spatial-frequential Transformer, namely DCTransformer, for JPEG quantized coefficient recovery. Specifically, a dual-branch architecture is designed to capture both spatial and frequential correlations within the collocated DCT coefficients. Moreover, we incorporate the operation of quantization matrix embedding, which effectively allows our single model to handle a wide range of quality factors, and a luminance-chrominance alignment head that produces a unified feature map to align different-sized luminance and chrominance components. Our proposed DCTransformer outperforms the current state-of-the-art JPEG artifact removal techniques, as demonstrated by our extensive experiments.

JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer

TL;DR

Abstract

Paper Structure (15 sections, 19 equations, 9 figures, 6 tables)

This paper contains 15 sections, 19 equations, 9 figures, 6 tables.

INTRODUCTION
RELATED WORKS
PROPOSED METHOD
The Overall Framework
QM Embedding and DCT Coefficient Rearrangement
Luminance-Chrominance Alignment Head
DCTransformer Body
Dual-domain Loss Function
EXPERIMENTS
Implement Details
Pixel Domain Evaluation
DCT Domain Evaluation
Comparison of Number of Parameters, Runtime, and Maximum GPU Memory Consumption
Ablation Studies
CONCLUSION

Figures (9)

Figure 1: Visualizations of the quantization and recovery of DCT coefficients and corresponding images. (a)-(b) The original image and its lossless DCT coefficients. (c)-(d) Compressed JPEG at QF = 10 and its highly sparse quantized coefficients. (e)-(f) Recovered coefficients of our DCTransformer and the reconstructed image. Note that only the coefficients of the Y channel are presented.
Figure 2: An overview of the pipeline of the proposed DCTransformer for JPEG quantized coefficient recovery. DCTransformer consists of four modules, i.e., pre-processing module, luminance-chrominance alignment head, DCTransformer body, and post-correlation module. The pre-processing module includes operations of quantization matrix embedding and coefficient rearrangement to prepare collocated DCT coefficients. Then the luminance-chrominance alignment head unifies different-sized Y and CbCr coefficients and fed into the DCTransformer body. This is processed through several Spatial-Frequential Transformer Blocks (SFTB) in the DCTransformer body. Finally, the post-correlation module reconstructs the full image from the recovered coefficients.
Figure 3: The schematic illustrations of a) Decoded DCT coefficient blocks. b) The proposed quantization matrix embedding. c) The rearrangement of collocated DCT coefficients. The resulting collocated DCT coefficients have an intrinsic correlation in both spatial and frequential dimensions.
Figure 4: The architecture of our Spatial-Frequential Transformer Block (SFTB). a) Illustration of the tokenization in spatial-wise and frequency-wise self-attention to extract diverse correlations. Note that multi-head design and learnable biases have been omitted for simplicity. b) The dual-branch architecture of SFTB. Each SFTB consists of two attention branches: spatial attention branch and frequential attention branch. The outputs of two branches are then channel concatenated and passed through a 3 $\times$ 3 convolutional layer.
Figure 5: Recovery comparisons of "LIVE1: buildings.bmp" with JPEG compression quality factor = 10. (a) compressed JPEG in 23.63 dB and SSIM 0.790. (b) DnCNN-3 DNCNN in 24.93 dB and SSIM 0.753. (c) QGAC QGAC method in 25.78 dB and SSIM 0.815. (d) FBCNN FBCNN method in 25.99 dB and SSIM 0.814. (e) our DCTransformer in 26.23 dB and SSIM 0.820. (f) the original image. Note that our method provides a less smoothing result with more natural textures (on window shutters and the rooftop), and less incorrect color blurring (red-colored above the sky in (a), (b), and (d)). Please zoom in to view the details.
...and 4 more figures

JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer

TL;DR

Abstract

JPEG Quantized Coefficient Recovery via DCT Domain Spatial-Frequential Transformer

Authors

TL;DR

Abstract

Table of Contents

Figures (9)