Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

Gen Li; Zhihao Shu; Jie Ji; Minghai Qin; Fatemeh Afghah; Wei Niu; Xiaolong Ma

Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

Gen Li, Zhihao Shu, Jie Ji, Minghai Qin, Fatemeh Afghah, Wei Niu, Xiaolong Ma

TL;DR

This work targets the model-switching overhead in video overfitting-based SR by proposing Dy-DCA, a content-aware dynamic DNN that reduces the number of deployed models to one. It couples a coarse-to-fine patching strategy with a routing-based dynamic SR network and an accompanying compiler framework (operator classification, data-flow analysis, fusion, and static planning) to handle dynamic input shapes efficiently on devices. The contributions include a dynamic network design, a data-flow-driven compiler optimization suite, and extensive mobile-device evaluation showing improved PSNR, real-time performance, and memory savings (e.g., $1.7×$ speedup and $1.61×$ memory reduction). The work indicates that algorithm–compiler–hardware co-design can make high-quality, on-device SR practical for edge video delivery systems.

Abstract

Deep neural networks (DNNs) are frequently employed in a variety of computer vision applications. Nowadays, an emerging trend in the current video distribution system is to take advantage of DNN's overfitting properties to perform video resolution upscaling. By splitting videos into chunks and applying a super-resolution (SR) model to overfit each chunk, this scheme of SR models plus video chunks is able to replace traditional video transmission to enhance video quality and transmission efficiency. However, many models and chunks are needed to guarantee high performance, which leads to tremendous overhead on model switching and memory footprints at the user end. To resolve such problems, we propose a Dynamic Deep neural network assisted by a Content-Aware data processing pipeline to reduce the model number down to one (Dy-DCA), which helps promote performance while conserving computational resources. Additionally, to achieve real acceleration on the user end, we designed a framework that optimizes dynamic features (e.g., dynamic shapes, sizes, and control flow) in Dy-DCA to enable a series of compilation optimizations, including fused code generation, static execution planning, etc. By employing such techniques, our method achieves better PSNR and real-time performance (33 FPS) on an off-the-shelf mobile phone. Meanwhile, assisted by our compilation optimization, we achieve a 1.7$\times$ speedup while saving up to 1.61$\times$ memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.

Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

TL;DR

speedup and

memory reduction). The work indicates that algorithm–compiler–hardware co-design can make high-quality, on-device SR practical for edge video delivery systems.

Abstract

speedup while saving up to 1.61

memory consumption. Code available in https://github.com/coulsonlee/Dy-DCA-ECCV2024.

Paper Structure (20 sections, 4 figures, 7 tables)

This paper contains 20 sections, 4 figures, 7 tables.

Introduction
Algorithm and hardware co-design
Motivations
Algorithm level optimization for hardware friendliness
Compiler level optimization to better support algorithm
DNN Operator Classification.
Data-flow analysis.
Operator fusion.
Static execution planning.
Experimental results
Experiment settings
Evaluation on VSD4K and UVG datasets
Deployment on Mobile Devices
More discussion on one-size-fits-all method
Ablation Study
...and 5 more sections

Figures (4)

Figure 1: Model switching overhead on currently widely used backbones in video data overfitting. Figure (a) show the switching time in EDSR lim2017enhanced and WDSR yu2018wide. Figure (b) demonstrates the comparison of video length and switching overhead. Figure (c) shows the total energy consumption brining by model switching.
Figure 2: Overview of the proposed framework Dy-DCA. We split video frames into different shapes, and all patches will be distributed at a learnable gating module, then overfitted by a dynamic SR model. The dynamic SR model and LR patches will be delivered to users for video super-resolution. The on-device inference is accelerated by our designed compiler optimization framework.
Figure 3: The lattice of the data-flow domain.
Figure 4: Super-resolution quality comparison on Dy-DCA and baseline methods.

Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

TL;DR

Abstract

Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design

Authors

TL;DR

Abstract

Table of Contents

Figures (4)