Table of Contents
Fetching ...

Super Efficient Neural Network for Compression Artifacts Reduction and Super Resolution

Wen Ma, Qiuwen Lou, Arman Kazemi, Julian Faraone, Tariq Afzal

TL;DR

The paper addresses bandwidth-constrained video delivery by introducing ARSR, a lightweight CNN that simultaneously reduces compression artifacts and upscales video from single frames. Building on SESR with ARCNN-inspired artifact mitigation, ARSR uses over-parameterization during training and depth-to-space upscaling to achieve hardware-friendly inference while processing only the Y channel. Key contributions include a compact model (~22K parameters), single-frame processing, and VMAF-based validation showing 4–6 point gains over Lanczos/Bicubic at low bitrates; it also demonstrates favorable efficiency versus heavier models like BasicVSR++. The work enables practical edge deployment for real-time video enhancement by delivering improved perceptual quality with minimal computational burden.

Abstract

Video quality can suffer from limited internet speed while being streamed by users. Compression artifacts start to appear when the bitrate decreases to match the available bandwidth. Existing algorithms either focus on removing the compression artifacts at the same video resolution, or on upscaling the video resolution but not removing the artifacts. Super resolution-only approaches will amplify the artifacts along with the details by default. We propose a lightweight convolutional neural network (CNN)-based algorithm which simultaneously performs artifacts reduction and super resolution (ARSR) by enhancing the feature extraction layers and designing a custom training dataset. The output of this neural network is evaluated for test streams compressed at low bitrates using variable bitrate (VBR) encoding. The output video quality shows a 4-6 increase in video multi-method assessment fusion (VMAF) score compared to traditional interpolation upscaling approaches such as Lanczos or Bicubic.

Super Efficient Neural Network for Compression Artifacts Reduction and Super Resolution

TL;DR

The paper addresses bandwidth-constrained video delivery by introducing ARSR, a lightweight CNN that simultaneously reduces compression artifacts and upscales video from single frames. Building on SESR with ARCNN-inspired artifact mitigation, ARSR uses over-parameterization during training and depth-to-space upscaling to achieve hardware-friendly inference while processing only the Y channel. Key contributions include a compact model (~22K parameters), single-frame processing, and VMAF-based validation showing 4–6 point gains over Lanczos/Bicubic at low bitrates; it also demonstrates favorable efficiency versus heavier models like BasicVSR++. The work enables practical edge deployment for real-time video enhancement by delivering improved perceptual quality with minimal computational burden.

Abstract

Video quality can suffer from limited internet speed while being streamed by users. Compression artifacts start to appear when the bitrate decreases to match the available bandwidth. Existing algorithms either focus on removing the compression artifacts at the same video resolution, or on upscaling the video resolution but not removing the artifacts. Super resolution-only approaches will amplify the artifacts along with the details by default. We propose a lightweight convolutional neural network (CNN)-based algorithm which simultaneously performs artifacts reduction and super resolution (ARSR) by enhancing the feature extraction layers and designing a custom training dataset. The output of this neural network is evaluated for test streams compressed at low bitrates using variable bitrate (VBR) encoding. The output video quality shows a 4-6 increase in video multi-method assessment fusion (VMAF) score compared to traditional interpolation upscaling approaches such as Lanczos or Bicubic.
Paper Structure (12 sections, 8 figures, 2 tables)

This paper contains 12 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Model architecture of the ARSR network. The low resolution (LR) frame goes through $N$ Conv2d layers for feature extraction, $M$ Conv2d layers for non-linear mapping, a Conv2d layer to match the upscaling factor, and a final depth-to-space layer to upscale the frame to super resolution (SR). For later experiments, unless specified, $N = 3$ and $M = 11$ are used.
  • Figure 2: During training, the Conv2d operation is expanded into two sequential Conv2d operations. This is an example where the number of input and output channels are both 16 while the number of internal expanded channel is 256.
  • Figure 3: Examples from the Vimeo dataset for training.
  • Figure 4: Comparison between different loss functions during training.
  • Figure 5: Using different number of layers of feature extraction to reduce artifacts.
  • ...and 3 more figures