Table of Contents
Fetching ...

ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network

Chih-Chia Hsu, Tian-Sheuan Chang

TL;DR

This work tackles real-time 8K super-resolution on edge devices by introducing ESSR, an edge-selective dynamic SR network that decides among three subnets based on simple input edge cues. The authors holistically optimize both network design and hardware: sharing weights across subnets, hardware-oriented modifications (including SFBs and DSConv/BSConv replacements), and a resource-adaptive switching strategy, all implemented on a GLNPU that maps groups of layers to PEs for high utilization. ESSR achieves an 84% reduction in parameters and an 83% reduction in MACs with less than 0.6dB PSNR loss, while delivering 8K@30FPS at 800MHz with 0.2075W—equating to 4797 Mpixels/J. The approach also delivers substantial hardware efficiency gains, including up to 79% reduction in feature SRAM accesses and 77%PE utilization, demonstrating strong potential for edge deployment of ultra-high-resolution SR. The methodology is extensible to other SR architectures and emphasizes a favorable trade-off between perceptual quality, hardware cost, and throughput.

Abstract

Deep learning-based super-resolution (SR) is challenging to implement in resource-constrained edge devices for resolutions beyond full HD due to its high computational complexity and memory bandwidth requirements. This paper introduces an 8K@30FPS SR accelerator with edge-selective dynamic input processing. Dynamic processing chooses the appropriate subnets for different patches based on simple input edge criteria, achieving a 50\% MAC reduction with only a 0.1dB PSNR decrease. The quality of reconstruction images is guaranteed and maximized its potential with \textit{resource adaptive model switching} even under resource constraints. In conjunction with hardware-specific refinements, the model size is reduced by 84\% to 51K, but with a decrease of less than 0.6dB PSNR. Additionally, to support dynamic processing with high utilization, this design incorporates a \textit{configurable group of layer mapping} that synergizes with the \textit{structure-friendly fusion block}, resulting in 77\% hardware utilization and up to 79\% reduction in feature SRAM access. The implementation, using the TSMC 28nm process, can achieve 8K@30FPS throughput at 800MHz with a gate count of 2749K, 0.2075W power consumption, and 4797Mpixels/J energy efficiency, exceeding previous work.

ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network

TL;DR

This work tackles real-time 8K super-resolution on edge devices by introducing ESSR, an edge-selective dynamic SR network that decides among three subnets based on simple input edge cues. The authors holistically optimize both network design and hardware: sharing weights across subnets, hardware-oriented modifications (including SFBs and DSConv/BSConv replacements), and a resource-adaptive switching strategy, all implemented on a GLNPU that maps groups of layers to PEs for high utilization. ESSR achieves an 84% reduction in parameters and an 83% reduction in MACs with less than 0.6dB PSNR loss, while delivering 8K@30FPS at 800MHz with 0.2075W—equating to 4797 Mpixels/J. The approach also delivers substantial hardware efficiency gains, including up to 79% reduction in feature SRAM accesses and 77%PE utilization, demonstrating strong potential for edge deployment of ultra-high-resolution SR. The methodology is extensible to other SR architectures and emphasizes a favorable trade-off between perceptual quality, hardware cost, and throughput.

Abstract

Deep learning-based super-resolution (SR) is challenging to implement in resource-constrained edge devices for resolutions beyond full HD due to its high computational complexity and memory bandwidth requirements. This paper introduces an 8K@30FPS SR accelerator with edge-selective dynamic input processing. Dynamic processing chooses the appropriate subnets for different patches based on simple input edge criteria, achieving a 50\% MAC reduction with only a 0.1dB PSNR decrease. The quality of reconstruction images is guaranteed and maximized its potential with \textit{resource adaptive model switching} even under resource constraints. In conjunction with hardware-specific refinements, the model size is reduced by 84\% to 51K, but with a decrease of less than 0.6dB PSNR. Additionally, to support dynamic processing with high utilization, this design incorporates a \textit{configurable group of layer mapping} that synergizes with the \textit{structure-friendly fusion block}, resulting in 77\% hardware utilization and up to 79\% reduction in feature SRAM access. The implementation, using the TSMC 28nm process, can achieve 8K@30FPS throughput at 800MHz with a gate count of 2749K, 0.2075W power consumption, and 4797Mpixels/J energy efficiency, exceeding previous work.

Paper Structure

This paper contains 33 sections, 26 figures, 12 tables, 1 algorithm.

Figures (26)

  • Figure 1: Inference of the edge selective dynamic input processing. Green patch: Bilinear interpolation. Yellow patch: C27. Red patch: C54.
  • Figure 2: The subnet types of the ESSR when compared to ARM.
  • Figure 3: The comparison of the C16 and C27. The image is from Set5 butterfly.
  • Figure 4: The relation between edge score and bilinear, C27, C54, GAN-based C27, and GAN-based C54. Higher values of PSNR and SSIM indicate better performance, while lower values of LPIPS indicate better performance in terms of perceptual similarity.
  • Figure 5: Proposed subnet decision with the input edge threshold .
  • ...and 21 more figures