Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
Gen Li, Zhihao Shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma
TL;DR
This work targets the model-switching overhead in video overfitting-based SR by proposing Dy-DCA, a content-aware dynamic DNN that reduces the number of deployed models to one. It couples a coarse-to-fine patching strategy with a routing-based dynamic SR network and an accompanying compiler framework (operator classification, data-flow analysis, fusion, and static planning) to handle dynamic input shapes efficiently on devices. The contributions include a dynamic network design, a data-flow-driven compiler optimization suite, and extensive mobile-device evaluation showing improved PSNR, real-time performance, and memory savings (e.g., $1.7×$ speedup and $1.61×$ memory reduction). The work indicates that algorithm–compiler–hardware co-design can make high-quality, on-device SR practical for edge video delivery systems.
Abstract
Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks is able to replace traditional video transmission to enhance video quality and transmission efficiency. However, many models and chunks are needed to guarantee high performance, which leads to tremendous overhead on model switching and memory footprints at the user end. To resolve such problems, we propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one (Dy-DCA), which helps promote performance while conserving computational resources. Additionally, to achieve real acceleration on the user end, we designed a framework that optimizes dynamic features (e.g., dynamic shapes, sizes, and control flow) in Dy-DCA to enable a series of compilation optimizations, including fused code generation, static execution planning, etc. By employing such techniques, our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone. Meanwhile, assisted by our compilation optimization, we achieve a 1.7$\times$ speedup while saving up to 1.61$\times$ memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.
