Table of Contents
Fetching ...

Neural Ganglion Sensors: Learning Task-specific Event Cameras Inspired by the Neural Circuit of the Human Retina

Haley M. So, Gordon Wetzstein

TL;DR

This work introduces Neural Ganglion Sensors, a retina-inspired extension of event cameras that learns task-specific spatio-temporal retinal kernels (RGC events) to improve perception tasks while reducing event bandwidth. By formulating a differentiable RGC event model with learnable kernels and thresholds, and enabling differentiable binning through a closed-form mapping, the approach bridges bio-inspired sensing with end-to-end learning. The framework supports multiple RGC channels and center-surround configurations, and is evaluated on video interpolation and optical flow, showing superior performance and bandwidth efficiency compared with traditional DVS/CSDVS baselines. These results demonstrate the potential of RGC-inspired event sensing for edge devices and real-time, low-power vision applications, with clear directions for hardware integration and future exploration of non-binary nonlinearities and richer kernel families.

Abstract

Inspired by the data-efficient spiking mechanism of neurons in the human eye, event cameras were created to achieve high temporal resolution with minimal power and bandwidth requirements by emitting asynchronous, per-pixel intensity changes rather than conventional fixed-frame rate images. Unlike retinal ganglion cells (RGCs) in the human eye, however, which integrate signals from multiple photoreceptors within a receptive field to extract spatio-temporal features, conventional event cameras do not leverage local spatial context when deciding which events to fire. Moreover, the eye contains around 20 different kinds of RGCs operating in parallel, each attuned to different features or conditions. Inspired by this biological design, we introduce Neural Ganglion Sensors, an extension of traditional event cameras that learns task-specific spatio-temporal retinal kernels (i.e., RGC "events"). We evaluate our design on two challenging tasks: video interpolation and optical flow. Our results demonstrate that our biologically inspired sensing improves performance relative to conventional event cameras while reducing overall event bandwidth. These findings highlight the promise of RGC-inspired event sensors for edge devices and other low-power, real-time applications requiring efficient, high-resolution visual streams.

Neural Ganglion Sensors: Learning Task-specific Event Cameras Inspired by the Neural Circuit of the Human Retina

TL;DR

This work introduces Neural Ganglion Sensors, a retina-inspired extension of event cameras that learns task-specific spatio-temporal retinal kernels (RGC events) to improve perception tasks while reducing event bandwidth. By formulating a differentiable RGC event model with learnable kernels and thresholds, and enabling differentiable binning through a closed-form mapping, the approach bridges bio-inspired sensing with end-to-end learning. The framework supports multiple RGC channels and center-surround configurations, and is evaluated on video interpolation and optical flow, showing superior performance and bandwidth efficiency compared with traditional DVS/CSDVS baselines. These results demonstrate the potential of RGC-inspired event sensing for edge devices and real-time, low-power vision applications, with clear directions for hardware integration and future exploration of non-binary nonlinearities and richer kernel families.

Abstract

Inspired by the data-efficient spiking mechanism of neurons in the human eye, event cameras were created to achieve high temporal resolution with minimal power and bandwidth requirements by emitting asynchronous, per-pixel intensity changes rather than conventional fixed-frame rate images. Unlike retinal ganglion cells (RGCs) in the human eye, however, which integrate signals from multiple photoreceptors within a receptive field to extract spatio-temporal features, conventional event cameras do not leverage local spatial context when deciding which events to fire. Moreover, the eye contains around 20 different kinds of RGCs operating in parallel, each attuned to different features or conditions. Inspired by this biological design, we introduce Neural Ganglion Sensors, an extension of traditional event cameras that learns task-specific spatio-temporal retinal kernels (i.e., RGC "events"). We evaluate our design on two challenging tasks: video interpolation and optical flow. Our results demonstrate that our biologically inspired sensing improves performance relative to conventional event cameras while reducing overall event bandwidth. These findings highlight the promise of RGC-inspired event sensors for edge devices and other low-power, real-time applications requiring efficient, high-resolution visual streams.

Paper Structure

This paper contains 37 sections, 14 equations, 14 figures, 1 table.

Figures (14)

  • Figure 1: Analogy between Neural Ganglion Sensors and the human retina: On the left, we show a simplified diagram of different layers in the human retina. Light hits the photoreceptors (rods and cones), of which there are about 100 million per eye. The signals get transferred and modulated through Bipolar cells along with additional Horizontal and Amacrine cells. In the end, the roughly 1 million Retinal Ganglion Cells (RGCs), receive signals from a small area on the retina, not just from a single photoreceptor. These RGCs look at the pattern of information to decide whether to send a spike signal to the brain. We see spatial and temporal pooling occurs in the first few layers of the retina to encode all the information into bandwidth efficient spiking potentials. On the right, we show our proposed Neural Ganglion Sensor, an event camera augmented to better match the human retina.
  • Figure 2: Video Interpolation Performance vs Bandwidth Trade-off. We perform video interpolation using DVS, CSDVS, CSDVS-Delbrück, RGC-log (learned, log regime), RGC-lin (learned, linear space), and RGC-lin-sv (learned, linear space, and spatially varying). For any given bandwidth, RGC-lin-sv provides the best performance.
  • Figure 3: Video Interpolation Qualitative Results. For each scene, we compare the reconstructions of the middle frame in the sequence for DVS, CSDVS, and RGC-lin. The top row shows the generated events, binned into the corresponding middle time bin, the second is the predicted image and the bottom row shows zoom-ins. The right-most column shows the start and end frames, alpha-blended, ground truth frame, and zoom-ins. PSNR($\uparrow$) and SSIM($\uparrow$) metrics are shown for each reconstruction.
  • Figure 4: Optical Flow Qualitative Results. For each sample, the top row shows the events generated by the DVS kernel and our learned RGC-lin kernel as well as the alpha-blended camera frames just for reference. In this task, solely events are used to reconstruct the flow. The bottom row shows the reconstructed flows and the ground truth flow. We show EPE$\downarrow$, 1PE$\downarrow$, and 3PE$\downarrow$ metrics for the reconstructions.
  • Figure 5: Comparison of Kernels. We show the $5\times5$ kernels for DVS, CSDVS, the learned RGC-lin kernels for video interpolation, and the learned RGC-lin kernels for optical flow.
  • ...and 9 more figures