Table of Contents
Fetching ...

Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression

Ali Mollaahmadi Dehaghi, Reza Razavi, Mohammad Moshirpour

TL;DR

This work tackles 8K video restoration under codec compression by introducing DiQP, a QP-aware Transformer-Diffusion model that directly models codec artifacts without adding artificial noise. The architecture fuses a U-Net–style Transformer backbone with Look Ahead, Look Around, and LOST embeddings to capture long-range spatiotemporal dependencies while conditioning on compression parameters. Empirical results on SEPE8K and UVG demonstrate clear PSNR and SSIM gains over competitive baselines for AV1 and HEVC across multiple QPs, highlighting DiQP’s effectiveness in recovering fine textures and reducing compression artifacts. The approach promises practical impact for high-resolution video delivery by enabling perceptually pleasing restoration of heavily compressed 8K content, with potential extension to other resolutions via refined conditioning schemes.

Abstract

In this paper, we introduce DiQP; a novel Transformer-Diffusion model for restoring 8K video quality degraded by codec compression. To the best of our knowledge, our model is the first to consider restoring the artifacts introduced by various codecs (AV1, HEVC) by Denoising Diffusion without considering additional noise. This approach allows us to model the complex, non-Gaussian nature of compression artifacts, effectively learning to reverse the degradation. Our architecture combines the power of Transformers to capture long-range dependencies with an enhanced windowed mechanism that preserves spatiotemporal context within groups of pixels across frames. To further enhance restoration, the model incorporates auxiliary "Look Ahead" and "Look Around" modules, providing both future and surrounding frame information to aid in reconstructing fine details and enhancing overall visual quality. Extensive experiments on different datasets demonstrate that our model outperforms state-of-the-art methods, particularly for high-resolution videos such as 4K and 8K, showcasing its effectiveness in restoring perceptually pleasing videos from highly compressed sources.

Reversing the Damage: A QP-Aware Transformer-Diffusion Approach for 8K Video Restoration under Codec Compression

TL;DR

This work tackles 8K video restoration under codec compression by introducing DiQP, a QP-aware Transformer-Diffusion model that directly models codec artifacts without adding artificial noise. The architecture fuses a U-Net–style Transformer backbone with Look Ahead, Look Around, and LOST embeddings to capture long-range spatiotemporal dependencies while conditioning on compression parameters. Empirical results on SEPE8K and UVG demonstrate clear PSNR and SSIM gains over competitive baselines for AV1 and HEVC across multiple QPs, highlighting DiQP’s effectiveness in recovering fine textures and reducing compression artifacts. The approach promises practical impact for high-resolution video delivery by enabling perceptually pleasing restoration of heavily compressed 8K content, with potential extension to other resolutions via refined conditioning schemes.

Abstract

In this paper, we introduce DiQP; a novel Transformer-Diffusion model for restoring 8K video quality degraded by codec compression. To the best of our knowledge, our model is the first to consider restoring the artifacts introduced by various codecs (AV1, HEVC) by Denoising Diffusion without considering additional noise. This approach allows us to model the complex, non-Gaussian nature of compression artifacts, effectively learning to reverse the degradation. Our architecture combines the power of Transformers to capture long-range dependencies with an enhanced windowed mechanism that preserves spatiotemporal context within groups of pixels across frames. To further enhance restoration, the model incorporates auxiliary "Look Ahead" and "Look Around" modules, providing both future and surrounding frame information to aid in reconstructing fine details and enhancing overall visual quality. Extensive experiments on different datasets demonstrate that our model outperforms state-of-the-art methods, particularly for high-resolution videos such as 4K and 8K, showcasing its effectiveness in restoring perceptually pleasing videos from highly compressed sources.

Paper Structure

This paper contains 16 sections, 5 equations, 8 figures, 6 tables, 1 algorithm.

Figures (8)

  • Figure 1: As the QP increases, the overall quality decreases, leading to more noticeable artifacts. Pixels are affected differently relative to each other. We calculated the Mean Absolute Difference and PSNR for these four quality levels and also present a heatmap of the affected areas.
  • Figure 2: The overall architecture of the proposed model
  • Figure 3: The Architectures of Look Around and Look Ahead Models
  • Figure 4: Frame difference analysis reveals the most significant changes at a temporal window size (T) of 50, indicating it as the optimal size for the Look Ahead model.
  • Figure 5: LOST embedding
  • ...and 3 more figures