Table of Contents
Fetching ...

NeuroCLIP: Neuromorphic Data Understanding by CLIP and SNN

Yufei Guo, Yuanpei Chen, Zhe Ma

TL;DR

This work addresses zero-shot and few-shot recognition for neuromorphic event data, which are challenging due to asynchronous spikes. It proposes NeuroCLIP, which transfers CLIP’s 2D pretraining to neuromorphic streams by converting spikes into multi-timestep frames and aligning them with CLIP’s image–text embeddings. A novel inter-timestep adapter based on a spiking neural network captures temporal dynamics to boost few-shot performance. Experiments on N-MNIST, CIFAR10-DVS, and ES-ImageNet show meaningful zero-shot results and sizable few-shot gains, driven by backbone choice and prompt design; code for NeuroCLIP is openly available.

Abstract

Recently, the neuromorphic vision sensor has received more and more interest. However, the neuromorphic data consists of asynchronous event spikes, which makes it difficult to construct a big benchmark to train a power general neural network model, thus limiting the neuromorphic data understanding for ``unseen" objects by deep learning. While for the frame image, since the training data can be obtained easily, the zero-shot and few-shot learning for ``unseen" task via the large Contrastive Vision-Language Pre-training (CLIP) model, which is pre-trained by large-scale image-text pairs in 2D, have shown inspirational performance. We wonder whether the CLIP could be transferred to neuromorphic data recognition to handle the ``unseen" problem. To this end, we materialize this idea with NeuroCLIP in the paper. The NeuroCLIP consists of 2D CLIP and two specially designed modules for neuromorphic data understanding. First, an event-frame module that could convert the event spikes to the sequential frame image with a simple discrimination strategy. Second, an inter-timestep adapter, which is a simple fine-tuned adapter based on a spiking neural network (SNN) for the sequential features coming from the visual encoder of CLIP to improve the few-shot performance. Various experiments on neuromorphic datasets including N-MNIST, CIFAR10-DVS, and ES-ImageNet demonstrate the effectiveness of NeuroCLIP. Our code is open-sourced at https://github.com/yfguo91/NeuroCLIP.git.

NeuroCLIP: Neuromorphic Data Understanding by CLIP and SNN

TL;DR

This work addresses zero-shot and few-shot recognition for neuromorphic event data, which are challenging due to asynchronous spikes. It proposes NeuroCLIP, which transfers CLIP’s 2D pretraining to neuromorphic streams by converting spikes into multi-timestep frames and aligning them with CLIP’s image–text embeddings. A novel inter-timestep adapter based on a spiking neural network captures temporal dynamics to boost few-shot performance. Experiments on N-MNIST, CIFAR10-DVS, and ES-ImageNet show meaningful zero-shot results and sizable few-shot gains, driven by backbone choice and prompt design; code for NeuroCLIP is openly available.

Abstract

Recently, the neuromorphic vision sensor has received more and more interest. However, the neuromorphic data consists of asynchronous event spikes, which makes it difficult to construct a big benchmark to train a power general neural network model, thus limiting the neuromorphic data understanding for ``unseen" objects by deep learning. While for the frame image, since the training data can be obtained easily, the zero-shot and few-shot learning for ``unseen" task via the large Contrastive Vision-Language Pre-training (CLIP) model, which is pre-trained by large-scale image-text pairs in 2D, have shown inspirational performance. We wonder whether the CLIP could be transferred to neuromorphic data recognition to handle the ``unseen" problem. To this end, we materialize this idea with NeuroCLIP in the paper. The NeuroCLIP consists of 2D CLIP and two specially designed modules for neuromorphic data understanding. First, an event-frame module that could convert the event spikes to the sequential frame image with a simple discrimination strategy. Second, an inter-timestep adapter, which is a simple fine-tuned adapter based on a spiking neural network (SNN) for the sequential features coming from the visual encoder of CLIP to improve the few-shot performance. Various experiments on neuromorphic datasets including N-MNIST, CIFAR10-DVS, and ES-ImageNet demonstrate the effectiveness of NeuroCLIP. Our code is open-sourced at https://github.com/yfguo91/NeuroCLIP.git.
Paper Structure (9 sections, 8 equations, 2 figures, 2 tables)

This paper contains 9 sections, 8 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The framework of NeuroCLIP. The NeuroCLIP first projects the neuromorphic data flow to multi-timestep frames and then conducts neuromorphic data recognition via CLIP pre-trained in 2D. For better few-shot classification, it also provides a learnable inter-timestep adapter based on a spiking neural network.
  • Figure 2: The detailed structure of the proposed inter-time Adapter.