ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network

Chih-Chia Hsu; Tian-Sheuan Chang

ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network

Chih-Chia Hsu, Tian-Sheuan Chang

TL;DR

This work tackles real-time 8K super-resolution on edge devices by introducing ESSR, an edge-selective dynamic SR network that decides among three subnets based on simple input edge cues. The authors holistically optimize both network design and hardware: sharing weights across subnets, hardware-oriented modifications (including SFBs and DSConv/BSConv replacements), and a resource-adaptive switching strategy, all implemented on a GLNPU that maps groups of layers to PEs for high utilization. ESSR achieves an 84% reduction in parameters and an 83% reduction in MACs with less than 0.6dB PSNR loss, while delivering 8K@30FPS at 800MHz with 0.2075W—equating to 4797 Mpixels/J. The approach also delivers substantial hardware efficiency gains, including up to 79% reduction in feature SRAM accesses and 77%PE utilization, demonstrating strong potential for edge deployment of ultra-high-resolution SR. The methodology is extensible to other SR architectures and emphasizes a favorable trade-off between perceptual quality, hardware cost, and throughput.

Abstract

Deep learning-based super-resolution (SR) is challenging to implement in resource-constrained edge devices for resolutions beyond full HD due to its high computational complexity and memory bandwidth requirements. This paper introduces an 8K@30FPS SR accelerator with edge-selective dynamic input processing. Dynamic processing chooses the appropriate subnets for different patches based on simple input edge criteria, achieving a 50\% MAC reduction with only a 0.1dB PSNR decrease. The quality of reconstruction images is guaranteed and maximized its potential with \textit{resource adaptive model switching} even under resource constraints. In conjunction with hardware-specific refinements, the model size is reduced by 84\% to 51K, but with a decrease of less than 0.6dB PSNR. Additionally, to support dynamic processing with high utilization, this design incorporates a \textit{configurable group of layer mapping} that synergizes with the \textit{structure-friendly fusion block}, resulting in 77\% hardware utilization and up to 79\% reduction in feature SRAM access. The implementation, using the TSMC 28nm process, can achieve 8K@30FPS throughput at 800MHz with a gate count of 2749K, 0.2075W power consumption, and 4797Mpixels/J energy efficiency, exceeding previous work.

ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network

TL;DR

Abstract

ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (26)