Towards Real-time Video Compressive Sensing on Mobile Devices

Miao Cao; Lishun Wang; Huan Wang; Guoqing Wang; Xin Yuan

Towards Real-time Video Compressive Sensing on Mobile Devices

Miao Cao, Lishun Wang, Huan Wang, Guoqing Wang, Xin Yuan

TL;DR

The paper tackles real-time mobile reconstruction for video Snapshot Compressive Imaging by introducing MobileSCI, a lightweight end-to-end network based on 2D convolutions. The architecture uses a U-shaped feature enhancement module with a novel channel-splitting/shuffling feature mixing block and a knowledge-distillation strategy to achieve high reconstruction quality with low computational burden. In experiments, MobileSCI delivers competitive PSNR/SSIM with real-time inference on mobile hardware (≈35 FPS on an iPhone 15) and significantly outperforms prior methods in speed on mobile while maintaining accuracy on simulated and real data. This work demonstrates a viable path toward fully mobile video SCI systems with practical deployment potential.

Abstract

Video Snapshot Compressive Imaging (SCI) uses a low-speed 2D camera to capture high-speed scenes as snapshot compressed measurements, followed by a reconstruction algorithm to retrieve the high-speed video frames. The fast evolving mobile devices and existing high-performance video SCI reconstruction algorithms motivate us to develop mobile reconstruction methods for real-world applications. Yet, it is still challenging to deploy previous reconstruction algorithms on mobile devices due to the complex inference process, let alone real-time mobile reconstruction. To the best of our knowledge, there is no video SCI reconstruction model designed to run on the mobile devices. Towards this end, in this paper, we present an effective approach for video SCI reconstruction, dubbed MobileSCI, which can run at real-time speed on the mobile devices for the first time. Specifically, we first build a U-shaped 2D convolution-based architecture, which is much more efficient and mobile-friendly than previous state-of-the-art reconstruction methods. Besides, an efficient feature mixing block, based on the channel splitting and shuffling mechanisms, is introduced as a novel bottleneck block of our proposed MobileSCI to alleviate the computational burden. Finally, a customized knowledge distillation strategy is utilized to further improve the reconstruction quality. Extensive results on both simulated and real data show that our proposed MobileSCI can achieve superior reconstruction quality with high efficiency on the mobile devices. Particularly, we can reconstruct a 256 X 256 X 8 snapshot compressed measurement with real-time performance (about 35 FPS) on an iPhone 15. Code is available at https://github.com/mcao92/MobileSCI.

Towards Real-time Video Compressive Sensing on Mobile Devices

TL;DR

Abstract

Paper Structure (16 sections, 6 equations, 5 figures, 4 tables)

This paper contains 16 sections, 6 equations, 5 figures, 4 tables.

Introduction
Related Work
Video SCI Reconstruction Methods
Mobile Networks
Preliminary: Video SCI System
Our Proposed Methods
Motivation
Overall MobileSCI Architecture
U-shaped Feature Enhancement Module
Efficient Feature Mixing Block
Loss Function
Experiments
Results on Simulated Data
Results on Real Data
Ablation Study
...and 1 more sections

Figures (5)

Figure 1: (a) Overall pipeline of the proposed MobileSCI network. (b) The convolutional unit of the convolutional block contains two $3\times3$ convolutional layers followed by a LeakyReLU function. (c) The feature mixing block is composed of two feature mixer layers. (d) The feature mixer layer composes of a channel projection layer and a $3\times3$ depth-wise convolutional layer. (e) In the channel projection layer, we first split the input feature ${\bf X}_{in}$ along the channel dimension as ${\bf X}_1$ and ${\bf X}_2$. Then, ${\bf X}_1$ undergoes two $1\times1$ convolutional layers followed by a LeakyReLU function to obtain the output feature ${\bf X}_{c1}$. Finally, we concatenate ${\bf X}_{c1}$ and ${\bf X}_2$ followed by channel shuffling to obtain the output feature.
Figure 2: Illustration of the real built video SCI system.
Figure 3: Reconstructed video frames of the simulated testing datasets. For a better view, we zoom in on a local area as shown in the small red boxes of each ground truth image, and do not show the small red boxes again for simplicity.
Figure 4: Reconstructed video frames of the real data. For a better view, we zoom in on two local areas as shown in the small red and green boxes of the first and second image.
Figure 5: Loss curve and PSNR value of different knowledge distillation strategies. In "Strategy 1", we randomly initialize the student model. In "Strategy 2", we integrate the pretrained weights from the teacher model to help the student model to learn.

Towards Real-time Video Compressive Sensing on Mobile Devices

TL;DR

Abstract

Towards Real-time Video Compressive Sensing on Mobile Devices

Authors

TL;DR

Abstract

Table of Contents

Figures (5)