Efficient Bayer-Domain Video Computer Vision with Fast Motion Estimation and Learned Perception Residual
Haichao Wang, Jiangtao Wen, Yuxing Han
TL;DR
The paper tackles the bottlenecks of ISP-induced latency and temporal redundancy in video perception on mobile/edge devices. It proposes an end-to-end Bayer-domain pipeline that removes the image signal processor, employs a fast GPU-oriented motion estimation to propagate predictions, and uses a learned perception residual to refine propagated features, achieving substantial acceleration with minimal accuracy loss. Key contributions include the invertible-ISP–driven Bayer raw data training, a pyramid, parallelizable motion estimation module, and a lightweight PRNet guided by a BerHu-based loss to correct perception-level errors. The approach demonstrates strong speedups on video semantic segmentation and object detection tasks, indicating practical impact for real-time, resource-constrained video vision systems.
Abstract
Video computer vision systems face substantial computational burdens arising from two fundamental challenges: eliminating unnecessary processing and reducing temporal redundancy in back-end inference while maintaining accuracy with minimal extra computation. To address these issues, we propose an efficient video computer vision framework that jointly optimizes both the front end and back end of the pipeline. On the front end, we remove the traditional image signal processor (ISP) and feed Bayer raw measurements directly into Bayer-domain vision models, avoiding costly human-oriented ISP operations. On the back end, we introduce a fast and highly parallel motion estimation algorithm that extracts inter-frame temporal correspondence to avoid redundant computation. To mitigate artifacts caused by motion inaccuracies, we further employ lightweight perception residual networks that directly learn perception-level residuals and refine the propagated features. Experiments across multiple models and tasks demonstrate that our system achieves substantial acceleration with only minor performance degradation.
