Table of Contents
Fetching ...

Microsaccade-inspired Event Camera for Robotics

Botao He, Ze Wang, Yuan Zhou, Jingxi Chen, Chahat Deep Singh, Haojia Li, Yuman Gao, Shaojie Shen, Kaiwei Wang, Yanjun Cao, Chao Xu, Yiannis Aloimonos, Fei Gao, Cornelia Fermuller

TL;DR

An event-based perception system capable of simultaneously maintaining low reaction time and stable texture and demonstrated the ability of the enhanced event camera to acquire more information about the environment and estimate high-speed motion when compared with standard event cameras, with potential to be adopted for robot vision.

Abstract

Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore challenging to solve algorithmically. Human vision deals with perceptual fading using the active mechanism of small involuntary eye movements, the most prominent ones called microsaccades. By moving the eyes constantly and slightly during fixation, microsaccades can substantially maintain texture stability and persistence. Inspired by microsaccades, we designed an event-based perception system capable of simultaneously maintaining low reaction time and stable texture. In this design, a rotating wedge prism was mounted in front of the aperture of an event camera to redirect light and trigger events. The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion, resulting in a stable texture appearance and high informational output independent of external motion. The hardware device and software solution are integrated into a system, which we call Artificial MIcrosaccade-enhanced EVent camera (AMI-EV). Benchmark comparisons validate the superior data quality of AMI-EV recordings in scenarios where both standard cameras and event cameras fail to deliver. Various real-world experiments demonstrate the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks.

Microsaccade-inspired Event Camera for Robotics

TL;DR

An event-based perception system capable of simultaneously maintaining low reaction time and stable texture and demonstrated the ability of the enhanced event camera to acquire more information about the environment and estimate high-speed motion when compared with standard event cameras, with potential to be adopted for robot vision.

Abstract

Neuromorphic vision sensors or event cameras have made the visual perception of extremely low reaction time possible, opening new avenues for high-dynamic robotics applications. These event cameras' output is dependent on both motion and texture. However, the event camera fails to capture object edges that are parallel to the camera motion. This is a problem intrinsic to the sensor and therefore challenging to solve algorithmically. Human vision deals with perceptual fading using the active mechanism of small involuntary eye movements, the most prominent ones called microsaccades. By moving the eyes constantly and slightly during fixation, microsaccades can substantially maintain texture stability and persistence. Inspired by microsaccades, we designed an event-based perception system capable of simultaneously maintaining low reaction time and stable texture. In this design, a rotating wedge prism was mounted in front of the aperture of an event camera to redirect light and trigger events. The geometrical optics of the rotating wedge prism allows for algorithmic compensation of the additional rotational motion, resulting in a stable texture appearance and high informational output independent of external motion. The hardware device and software solution are integrated into a system, which we call Artificial MIcrosaccade-enhanced EVent camera (AMI-EV). Benchmark comparisons validate the superior data quality of AMI-EV recordings in scenarios where both standard cameras and event cameras fail to deliver. Various real-world experiments demonstrate the potential of the system to facilitate robotics perception both for low-level and high-level vision tasks.
Paper Structure (26 sections, 17 equations, 8 figures)

This paper contains 26 sections, 17 equations, 8 figures.

Figures (8)

  • Figure 1: Demonstration of how microsaccades counteract visual fading. A simple yet intuitive example demonstrating visual fading and how microsaccades counteract it. We recommend enlarging the image to at least $15 \times 15$ cm and keeping your eyes 40cm away from the screen. After a few seconds of fixation on the red spot, the bluish annulus and the background will fade. This is because microsaccades are suppressed during this time, and therefore, the eye cannot provide effective visual stimulation to prevent peripheral fading. On the other hand, when saccading between the purple spots, the annulus is always experienced, possibly fading slower even though the saccades are small, typically $0.5^\circ$-$1.0^\circ$ depending on the viewer's distance from the figure.
  • Figure 2: Overview of our entire system, including both hardware and software.(A) Real-world hardware and Computer-Aided Design (CAD) model. (B) Illustration of the incoming light refraction as the wedge prism rotates. (C) Event generation and compensation process, with the images on the left resulting from accumulating the event streams shown on the right. (D) System overview.
  • Figure 3: Illustration of our approach's improvement on texture enhancement.(A) The ODS-F (higher is better) is used to measure the structural completeness of the accumulated event images. (B) Temporal snapshots of (A). (C) Comparison of the reconstructed gray-scale images. (C) is the snapshots of (F), the color red for the box is used to indicate that the system is static, and purple denotes that the system is moving upward (along Y-axis). (D) Histogram of Event Density Distribution for the original event stream and our enhanced event stream. More detailed illustrations can be found in Suppl. Fig. S10. (E) Entropy comparison of accumulated event images. In (A) and (E), solid curves indicate the median value over a time window of 10 data points. In contrast, the top and bottom bounds of the transparent regions indicate their maximum and minimum values. (F) Quantitative comparison of the reconstructed image quality using the Natural image quality evaluator (NIQE, lower is better) mittal2012making.
  • Figure 4: Evaluation of feature detection and matching.(A) Environment setups of four experiments. (B) Results of the corner detection and tracking experiments. The left column of (i)-(iii) provides a comparison of the number of trackable corners, and the three right columns show snapshots. (iv) and (v) are metric comparisons visualized using box and bar graphs. (iv) indicates the lifetime of all trackable corners, and (v) shows the response time. (C) Results of the motion segmentation experiment. Blue parts indicate the background and red parts indicate independently moving objects.
  • Figure 5: Evaluation of human detection and pose estimation.(A-C) Results of human pose estimation for S-EV (A), AMI-EV (B) and a standard camera (C) on four actions: wave the hand, shake arms, baseball batting action, and ping-pong batting. The former two actions are slow and the latter two are fast. (D) Metrics comparisons. The framerate denotes the number of frames per second that the standard event-to-video algorithm, called E2VID rebecq2019high, is configured to generate. Intersection over Union (IoU) provides a measure of human detection performance, and Percentage Detected Joints (PDJ) is a measure of the detected joints' localization precision and completeness. Because the sampling frame rate varies greatly from different sensors, we use the Semilog plot (x-axis has log scale) to visualize the data.
  • ...and 3 more figures