Table of Contents
Fetching ...

Enhanced Event-Based Video Reconstruction with Motion Compensation

Siying Liu, Pier Luigi Dragotti

TL;DR

This work proposes warping the input intensity frames and sparse codes to enhance reconstruction quality and achieves state-of-the-art reconstruction accuracy and simultaneously provides reliable dense flow estimation in a CISTA-Flow network.

Abstract

Deep neural networks for event-based video reconstruction often suffer from a lack of interpretability and have high memory demands. A lightweight network called CISTA-LSTC has recently been introduced showing that high-quality reconstruction can be achieved through the systematic design of its architecture. However, its modelling assumption that input signals and output reconstructed frame share the same sparse representation neglects the displacement caused by motion. To address this, we propose warping the input intensity frames and sparse codes to enhance reconstruction quality. A CISTA-Flow network is constructed by integrating a flow network with CISTA-LSTC for motion compensation. The system relies solely on events, in which predicted flow aids in reconstruction and then reconstructed frames are used to facilitate flow estimation. We also introduce an iterative training framework for this combined system. Results demonstrate that our approach achieves state-of-the-art reconstruction accuracy and simultaneously provides reliable dense flow estimation. Furthermore, our model exhibits flexibility in that it can integrate different flow networks, suggesting its potential for further performance enhancement.

Enhanced Event-Based Video Reconstruction with Motion Compensation

TL;DR

This work proposes warping the input intensity frames and sparse codes to enhance reconstruction quality and achieves state-of-the-art reconstruction accuracy and simultaneously provides reliable dense flow estimation in a CISTA-Flow network.

Abstract

Deep neural networks for event-based video reconstruction often suffer from a lack of interpretability and have high memory demands. A lightweight network called CISTA-LSTC has recently been introduced showing that high-quality reconstruction can be achieved through the systematic design of its architecture. However, its modelling assumption that input signals and output reconstructed frame share the same sparse representation neglects the displacement caused by motion. To address this, we propose warping the input intensity frames and sparse codes to enhance reconstruction quality. A CISTA-Flow network is constructed by integrating a flow network with CISTA-LSTC for motion compensation. The system relies solely on events, in which predicted flow aids in reconstruction and then reconstructed frames are used to facilitate flow estimation. We also introduce an iterative training framework for this combined system. Results demonstrate that our approach achieves state-of-the-art reconstruction accuracy and simultaneously provides reliable dense flow estimation. Furthermore, our model exhibits flexibility in that it can integrate different flow networks, suggesting its potential for further performance enhancement.
Paper Structure (23 sections, 9 equations, 7 figures, 3 tables)

This paper contains 23 sections, 9 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Recursive CISTA-Flow architecture. (a) Original CISTA-LSTC network. (b) CISTA-Flow network. The DCEIFlow network leverages the previously reconstructed frame $\hat{\bm I}_{t-1}$ and event voxel grid ${\bm E}_{t-1}^t$ to estimate the forward flow $\hat{\bm F}_{t-1\rightarrow t}$. This estimated flow is then utilized to warp $\hat{\bm I}_{t-1}$ and the sparse codes ${\bm Z}_{t-1}^K$ obtained from the previous reconstruction. Finally, CISTA-LSTC employs these warped inputs to reconstruct the frame $\hat{\bm I}_{t}$. The initial $\hat{\bm I}_{0}$ is set to 0.
  • Figure 2: Reconstruction results with different warped frames. "Warp I" denotes the model with warped $\hat{\bm I}_{t-1}$ and "Warp I+Z" denotes the model with warped $\hat{\bm I}_{t-1}$ and ${\bm Z}_{t-1}^{K}$.
  • Figure 3: Separate training for (a) DCEIFlow using ground truth input frame and (b) CISTA-LSTC using ground truth flow, and additional training for (c) DCEIFlow (Rec I) using reconstructed frames generated by CISTA (GT Flow).
  • Figure 4: Comparison of video reconstruction between CISTA-Flow and other networks.
  • Figure 5: Reconstruction results of CISTA-LSTC and CISTA-Flow networks for scenes with multiple objects. The green boxes highlight sharper edges and finer details improved by the estimated flow, whereas the red boxes indicate areas with inaccuracies in reconstruction due to inaccurate flow.
  • ...and 2 more figures