Table of Contents
Fetching ...

Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks

Xu Zheng, Yexin Liu, Yunfan Lu, Tongyan Hua, Tianbo Pan, Weiming Zhang, Dacheng Tao, Lin Wang

TL;DR

The surveyed work addresses the demand for DL-based techniques in event-based vision by organizing methods around input representations, quality enhancement, image restoration, and high-level scene understanding. It surveys six representative event representations, methods for denoising and super-resolution, and a broad range of DL approaches for tasks from intensity reconstruction to SLAM and 3D human pose estimation. The paper also benchmarks representative methods across image reconstruction, deblurring, and object recognition tasks, and discusses open challenges, new directions, and the importance of public code repositories. The findings underscore the potential of DL to unlock the advantages of event cameras while highlighting remaining gaps in data, latency, and cross-modal fusion that must be addressed for real-world deployment.

Abstract

Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously and produce event streams encoding the time, pixel position, and polarity (sign) of the intensity changes. Event cameras possess a myriad of advantages over canonical frame-based cameras, such as high temporal resolution, high dynamic range, low latency, etc. Being capable of capturing information in challenging visual conditions, event cameras have the potential to overcome the limitations of frame-based cameras in the computer vision and robotics community. In very recent years, deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential. However, there is still a lack of taxonomies in DL techniques for event-based vision. We first scrutinize the typical event representations with quality enhancement methods as they play a pivotal role as inputs to the DL models. We then provide a comprehensive survey of existing DL-based methods by structurally grouping them into two major categories: 1) image/video reconstruction and restoration; 2) event-based scene understanding and 3D vision. We conduct benchmark experiments for the existing methods in some representative research directions, i.e., image reconstruction, deblurring, and object recognition, to identify some critical insights and problems. Finally, we have discussions regarding the challenges and provide new perspectives for inspiring more research studies.

Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks

TL;DR

The surveyed work addresses the demand for DL-based techniques in event-based vision by organizing methods around input representations, quality enhancement, image restoration, and high-level scene understanding. It surveys six representative event representations, methods for denoising and super-resolution, and a broad range of DL approaches for tasks from intensity reconstruction to SLAM and 3D human pose estimation. The paper also benchmarks representative methods across image reconstruction, deblurring, and object recognition tasks, and discusses open challenges, new directions, and the importance of public code repositories. The findings underscore the potential of DL to unlock the advantages of event cameras while highlighting remaining gaps in data, latency, and cross-modal fusion that must be addressed for real-world deployment.

Abstract

Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously and produce event streams encoding the time, pixel position, and polarity (sign) of the intensity changes. Event cameras possess a myriad of advantages over canonical frame-based cameras, such as high temporal resolution, high dynamic range, low latency, etc. Being capable of capturing information in challenging visual conditions, event cameras have the potential to overcome the limitations of frame-based cameras in the computer vision and robotics community. In very recent years, deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential. However, there is still a lack of taxonomies in DL techniques for event-based vision. We first scrutinize the typical event representations with quality enhancement methods as they play a pivotal role as inputs to the DL models. We then provide a comprehensive survey of existing DL-based methods by structurally grouping them into two major categories: 1) image/video reconstruction and restoration; 2) event-based scene understanding and 3D vision. We conduct benchmark experiments for the existing methods in some representative research directions, i.e., image reconstruction, deblurring, and object recognition, to identify some critical insights and problems. Finally, we have discussions regarding the challenges and provide new perspectives for inspiring more research studies.
Paper Structure (34 sections, 3 equations, 12 figures, 11 tables)

This paper contains 34 sections, 3 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: The structural and hierarchical taxonomy of event-based vision with deep learning.
  • Figure 2: Methods for event-based intensity reconstruction.
  • Figure 3: Visual examples of some SOTA methods for video reconstruction (E2VID rebecq2019high, EF stoffregen2020reducing, RCNN zou2021learning).
  • Figure 4: Representative VFI methods, including, e.g., (a) TimeLens tulyakov2021time, the first event-guided VFI method (b) TimeLens++tulyakov2022time, the SOTA event-based VFI method (c) TimeReplayerhe2022timereplayer, the first unsupervised event-guided VFI method.
  • Figure 5: Visual results of VFI by three different methods. (TimeLenstulyakov2021time, TimeReplayerhe2022timereplayer, $A^2OF$wu2022video).
  • ...and 7 more figures