Table of Contents
Fetching ...

Event USKT : U-State Space Model in Knowledge Transfer for Event Cameras

Yuhui Lin, Jiahao Zhang, Siyuan Li, Jimin Xiao, Ding Xu, Wenjun Wu, Jiaxuan Lu

TL;DR

A tailored U-shaped State Space Model Knowledge Transfer (USKT) framework for Event-to-RGB knowledge transfer is introduced, enabling event data to effectively reuse pre-trained RGB models and achieve competitive performance with minimal parameter tuning.

Abstract

Event cameras, as an emerging imaging technology, offer distinct advantages over traditional RGB cameras, including reduced energy consumption and higher frame rates. However, the limited quantity of available event data presents a significant challenge, hindering their broader development. To alleviate this issue, we introduce a tailored U-shaped State Space Model Knowledge Transfer (USKT) framework for Event-to-RGB knowledge transfer. This framework generates inputs compatible with RGB frames, enabling event data to effectively reuse pre-trained RGB models and achieve competitive performance with minimal parameter tuning. Within the USKT architecture, we also propose a bidirectional reverse state space model. Unlike conventional bidirectional scanning mechanisms, the proposed Bidirectional Reverse State Space Model (BiR-SSM) leverages a shared weight strategy, which facilitates efficient modeling while conserving computational resources. In terms of effectiveness, integrating USKT with ResNet50 as the backbone improves model performance by 0.95%, 3.57%, and 2.9% on DVS128 Gesture, N-Caltech101, and CIFAR-10-DVS datasets, respectively, underscoring USKT's adaptability and effectiveness. The code will be made available upon acceptance.

Event USKT : U-State Space Model in Knowledge Transfer for Event Cameras

TL;DR

A tailored U-shaped State Space Model Knowledge Transfer (USKT) framework for Event-to-RGB knowledge transfer is introduced, enabling event data to effectively reuse pre-trained RGB models and achieve competitive performance with minimal parameter tuning.

Abstract

Event cameras, as an emerging imaging technology, offer distinct advantages over traditional RGB cameras, including reduced energy consumption and higher frame rates. However, the limited quantity of available event data presents a significant challenge, hindering their broader development. To alleviate this issue, we introduce a tailored U-shaped State Space Model Knowledge Transfer (USKT) framework for Event-to-RGB knowledge transfer. This framework generates inputs compatible with RGB frames, enabling event data to effectively reuse pre-trained RGB models and achieve competitive performance with minimal parameter tuning. Within the USKT architecture, we also propose a bidirectional reverse state space model. Unlike conventional bidirectional scanning mechanisms, the proposed Bidirectional Reverse State Space Model (BiR-SSM) leverages a shared weight strategy, which facilitates efficient modeling while conserving computational resources. In terms of effectiveness, integrating USKT with ResNet50 as the backbone improves model performance by 0.95%, 3.57%, and 2.9% on DVS128 Gesture, N-Caltech101, and CIFAR-10-DVS datasets, respectively, underscoring USKT's adaptability and effectiveness. The code will be made available upon acceptance.

Paper Structure

This paper contains 27 sections, 8 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: The proposed U-shaped State Space Model Knowledge Transfer (USKT) framework with the BiR-SSM module combines reconstruction and classification losses for Event-to-RGB feature adaptation, enabling the reuse of the pre-trained RGB encoder.
  • Figure 2: Overview of USKT framework. The proposed method is based on a U-shaped network, starting by mapping event data into suitable channels for USKT input through a time-accumulation. Subsequently, the data dimension is increased and the size is reduced via a downsampling process. Furthermore, we design a Bidirectional Reverse State Space Model (BiR-SSM) for sequence modeling. Following this, data is restored to its original resolution through an upsampling process. Finally, a reconstruction loss is introduced to enhance classification accuracy.
  • Figure 3: The figure on the left shows the traditional Bi-SSM, while the figure on the right represents our proposed BiR-SSM.
  • Figure 4: The left is the comparison of the performance of ResNet50 with different numbers of SSM layers in USKT and the right is the comparison of the performance of ResNet50 with different $\lambda_2$, showing top-1 accuracy on DVS128 Gesture, N-Caltech101 and CIAFR-10-DVS.