Table of Contents
Fetching ...

A Joint Visual Compression and Perception Framework for Neuralmorphic Spiking Camera

Kexiang Feng, Chuanmin Jia, Siwei Ma, Wen Gao

TL;DR

The paper tackles efficient compression of neuralmorphic spike data while preserving downstream analysis. It introduces Spike Coding for Intelligence (SCI) and a compress-and-analyze-simultaneously (CAAS) paradigm built on a dual-pathway spike processing architecture with a Pathway Fusion Unit, Feature-level Motion Vector Refinement (FMVR), and Associated Feature Regression (AFR) for robust short- and long-term feature handling. The approach achieves state-of-the-art results, including a 17.25% BD-rate reduction for spike compression and a 4.3% accuracy improvement over SpiReco for spike-based classification, with substantial encoder-side complexity and latency reductions that enable practical end-cloud deployment. Collectively, this work advances spike-based visual intelligence by jointly optimizing compression and analytics for ultra-high-temporal-resolution spike data.

Abstract

The advent of neuralmorphic spike cameras has garnered significant attention for their ability to capture continuous motion with unparalleled temporal resolution.However, this imaging attribute necessitates considerable resources for binary spike data storage and transmission.In light of compression and spike-driven intelligent applications, we present the notion of Spike Coding for Intelligence (SCI), wherein spike sequences are compressed and optimized for both bit-rate and task performance.Drawing inspiration from the mammalian vision system, we propose a dual-pathway architecture for separate processing of spatial semantics and motion information, which is then merged to produce features for compression.A refinement scheme is also introduced to ensure consistency between decoded features and motion vectors.We further propose a temporal regression approach that integrates various motion dynamics, capitalizing on the advancements in warping and deformation simultaneously.Comprehensive experiments demonstrate our scheme achieves state-of-the-art (SOTA) performance for spike compression and analysis.We achieve an average 17.25% BD-rate reduction compared to SOTA codecs and a 4.3% accuracy improvement over SpiReco for spike-based classification, with 88.26% complexity reduction and 42.41% inference time saving on the encoding side.

A Joint Visual Compression and Perception Framework for Neuralmorphic Spiking Camera

TL;DR

The paper tackles efficient compression of neuralmorphic spike data while preserving downstream analysis. It introduces Spike Coding for Intelligence (SCI) and a compress-and-analyze-simultaneously (CAAS) paradigm built on a dual-pathway spike processing architecture with a Pathway Fusion Unit, Feature-level Motion Vector Refinement (FMVR), and Associated Feature Regression (AFR) for robust short- and long-term feature handling. The approach achieves state-of-the-art results, including a 17.25% BD-rate reduction for spike compression and a 4.3% accuracy improvement over SpiReco for spike-based classification, with substantial encoder-side complexity and latency reductions that enable practical end-cloud deployment. Collectively, this work advances spike-based visual intelligence by jointly optimizing compression and analytics for ultra-high-temporal-resolution spike data.

Abstract

The advent of neuralmorphic spike cameras has garnered significant attention for their ability to capture continuous motion with unparalleled temporal resolution.However, this imaging attribute necessitates considerable resources for binary spike data storage and transmission.In light of compression and spike-driven intelligent applications, we present the notion of Spike Coding for Intelligence (SCI), wherein spike sequences are compressed and optimized for both bit-rate and task performance.Drawing inspiration from the mammalian vision system, we propose a dual-pathway architecture for separate processing of spatial semantics and motion information, which is then merged to produce features for compression.A refinement scheme is also introduced to ensure consistency between decoded features and motion vectors.We further propose a temporal regression approach that integrates various motion dynamics, capitalizing on the advancements in warping and deformation simultaneously.Comprehensive experiments demonstrate our scheme achieves state-of-the-art (SOTA) performance for spike compression and analysis.We achieve an average 17.25% BD-rate reduction compared to SOTA codecs and a 4.3% accuracy improvement over SpiReco for spike-based classification, with 88.26% complexity reduction and 42.41% inference time saving on the encoding side.

Paper Structure

This paper contains 18 sections, 9 equations, 17 figures, 10 tables.

Figures (17)

  • Figure 1: Sketch for application of spike vision. At the end-side, spike camera captures scenes and generates binary spike sequences, which are extracted as high-level features. After compression and transmission, decoded features are utilized for several applications at the cloud-side.
  • Figure 2: Comparison between three paradigms for spike compression and analysis, including (a) compress-then-analyze (CTA), (b) analyze-then-compress (ATC) and (c) compress-and-analyze-simultaneously (CAAS) paradigms. Extensive experimental results show CAAS paradigm exceeds other two paradigms, providing a novel perspective for SCI.
  • Figure 3: Framework of method which compresses spike sequence and analyzes for downstream tasks simultaneously. The whole process is mainly separated into three modules. The short-term feature extraction module converts spike sequence into compression-friendly feature ($F_t\in [0,1]^{h\times w}$). The feature compression module effectively encodes feature sequence, eliminating intra- and inter-feature redundancies. The long-term feature analysis module optimizes for downstream task performance using decoded feature sequence and refined motion vector sequence.
  • Figure 4: Detailed structure of (a) PFU and (b) FMVR. The PFU fuses spatial semantic information from dorsal pathway and motion characteristic from ventral pathway to generate features which are compression-friendly. The FMVR utilizes decoded features to constrains consistency of content between feature and latent domain.
  • Figure 5: Comparison of architecture and temporal receptive field between dual-pathway filters. The dorsal filter consists of sliding window-based temporal sub-filters, whereas the ventral filter comprises multi-scale temporal sub-filters.
  • ...and 12 more figures