Table of Contents
Fetching ...

Lightweight High-Speed Photography Built on Coded Exposure and Implicit Neural Representation of Videos

Zhihong Zhang, Runzhao Yang, Jinli Suo, Yuxiao Cheng, Qionghai Dai

TL;DR

This study proposes a novel approach to address the challenges of restoring motion from blur by combining the classical coded exposure imaging technique with the emerging implicit neural representation for videos, and develops a novel implicit neural representation based blur decomposition network.

Abstract

The demand for compact cameras capable of recording high-speed scenes with high resolution is steadily increasing. However, achieving such capabilities often entails high bandwidth requirements, resulting in bulky, heavy systems unsuitable for low-capacity platforms. To address this challenge, leveraging a coded exposure setup to encode a frame sequence into a blurry snapshot and subsequently retrieve the latent sharp video presents a lightweight solution. Nevertheless, restoring motion from blur remains a formidable challenge due to the inherent ill-posedness of motion blur decomposition, the intrinsic ambiguity in motion direction, and the diverse motions present in natural videos. In this study, we propose a novel approach to address these challenges by combining the classical coded exposure imaging technique with the emerging implicit neural representation for videos. We strategically embed motion direction cues into the blurry image during the imaging process. Additionally, we develop a novel implicit neural representation based blur decomposition network to sequentially extract the latent video frames from the blurry image, leveraging the embedded motion direction cues. To validate the effectiveness and efficiency of our proposed framework, we conduct extensive experiments using benchmark datasets and real-captured blurry images. The results demonstrate that our approach significantly outperforms existing methods in terms of both quality and flexibility. The code for our work is available at .https://github.com/zhihongz/BDINR

Lightweight High-Speed Photography Built on Coded Exposure and Implicit Neural Representation of Videos

TL;DR

This study proposes a novel approach to address the challenges of restoring motion from blur by combining the classical coded exposure imaging technique with the emerging implicit neural representation for videos, and develops a novel implicit neural representation based blur decomposition network.

Abstract

The demand for compact cameras capable of recording high-speed scenes with high resolution is steadily increasing. However, achieving such capabilities often entails high bandwidth requirements, resulting in bulky, heavy systems unsuitable for low-capacity platforms. To address this challenge, leveraging a coded exposure setup to encode a frame sequence into a blurry snapshot and subsequently retrieve the latent sharp video presents a lightweight solution. Nevertheless, restoring motion from blur remains a formidable challenge due to the inherent ill-posedness of motion blur decomposition, the intrinsic ambiguity in motion direction, and the diverse motions present in natural videos. In this study, we propose a novel approach to address these challenges by combining the classical coded exposure imaging technique with the emerging implicit neural representation for videos. We strategically embed motion direction cues into the blurry image during the imaging process. Additionally, we develop a novel implicit neural representation based blur decomposition network to sequentially extract the latent video frames from the blurry image, leveraging the embedded motion direction cues. To validate the effectiveness and efficiency of our proposed framework, we conduct extensive experiments using benchmark datasets and real-captured blurry images. The results demonstrate that our approach significantly outperforms existing methods in terms of both quality and flexibility. The code for our work is available at .https://github.com/zhihongz/BDINR
Paper Structure (16 sections, 6 equations, 13 figures, 5 tables)

This paper contains 16 sections, 6 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: The overall schematic and demo results of the proposed blur decomposition framework. On the imaging side, coded exposure photography is employed to embed motion direction cues into the captured coded blurry image. It also facilitates the information preservation of the blurry image across all frequencies. On the algorithm side, a video INR based self-recursive blur decomposition network (BDINR) is developed to extract the latent video sequence collapsed in the coded blurry image by exploiting the embedded motion direction cues.
  • Figure 2: Motion ambiguity in blur decomposition and motion direction embedding via coded exposure. In this toy example, we use two horizontally translating objects, i.e., the orange cube and the green ball, for a demonstration. (a) shows four possible motion scenarios of these two objects. They translate from current positions to the dashed boxes/circles for the same distance. (b), (c), and (d) show the resulting blurry images captured under conventional exposure ('11111'), coded exposure with an asymmetric encoding sequence ('11101'), and coded exposure with a symmetric encoding sequence ('11011'), respectively. The center-line intensity profiles of the blurry images are also plotted on their right side. (c) demonstrates that employing coded exposure with an asymmetric encoding sequence will result in asymmetric blurry profiles, from which the moving direction can be retrieved (i.e. from the black arrow towards the blue arrow). Conversely, the other two cases shown in (b) and (d) will result in the same blurry images for different combinations of motion directions, thus causing the motion direction ambiguity issue in blur decomposition.
  • Figure 3: The overall flowchart of the proposed video INR based self-recursive blur decomposition network (BDINR). The temporal embedding module (TEM) fuses the frame order index and corresponding exposure-encoding sequence to generate the temporal context embedding. The spatial embedding module (SEM) maps the coded blurry image into a continuous feature space to serve as the spatial context embedding. These embeddings are then input to the video INR module (INRV) for latent frame extraction in a self-recursive manner.
  • Figure 4: The specific network structure of different modules involved in BDINR. INRV comprises a two-level encoder-decoder architecture to fuse the spatial and temporal embeddings. TEM is implemented with a two-layer perceptron. SEM and the rest of the modules are mainly composed of convolutional layers and residual blocks.
  • Figure 5: Params-PSNR-MACs comparsion with the comparative methods on GoPro. The size of the bubbles represents the MACs index.
  • ...and 8 more figures