Table of Contents
Fetching ...

Breaking the Modality Wall: Time-step Mixup for Efficient Spiking Knowledge Transfer from Static to Event Domain

Yuqi Xie, Shuhan Ye, Yi Yu, Chong Wang, Qixin Zhang, Jiazhen Xu, Le Shen, Yuanbin Qian, Jiangbo Qian, Guoqi Li

TL;DR

Time-step Mixup Knowledge Transfer (TMKT), a cross-modal training framework with a probabilistic Time-step Mixup (TSM) strategy, enables smoother knowledge transfer, helps mitigate modality mismatch during training, and achieves superior performance in spiking image classification tasks.

Abstract

The integration of event cameras and spiking neural networks (SNNs) promises energy-efficient visual intelligence, yet scarce event data and the sparsity of DVS outputs hinder effective training. Prior knowledge transfers from RGB to DVS often underperform because the distribution gap between modalities is substantial. In this work, we present Time-step Mixup Knowledge Transfer (TMKT), a cross-modal training framework with a probabilistic Time-step Mixup (TSM) strategy. TSM exploits the asynchronous nature of SNNs by interpolating RGB and DVS inputs at various time steps to produce a smooth curriculum within each sequence, which reduces gradient variance and stabilizes optimization with theoretical analysis. To employ auxiliary supervision from TSM, TMKT introduces two lightweight modality-aware objectives, Modality Aware Guidance (MAG) for per-frame source supervision and Mixup Ratio Perception (MRP) for sequence-level mix ratio estimation, which explicitly align temporal features with the mixing schedule. TMKT enables smoother knowledge transfer, helps mitigate modality mismatch during training, and achieves superior performance in spiking image classification tasks. Extensive experiments across diverse benchmarks and multiple SNN backbones, together with ablations, demonstrate the effectiveness of our method.

Breaking the Modality Wall: Time-step Mixup for Efficient Spiking Knowledge Transfer from Static to Event Domain

TL;DR

Time-step Mixup Knowledge Transfer (TMKT), a cross-modal training framework with a probabilistic Time-step Mixup (TSM) strategy, enables smoother knowledge transfer, helps mitigate modality mismatch during training, and achieves superior performance in spiking image classification tasks.

Abstract

The integration of event cameras and spiking neural networks (SNNs) promises energy-efficient visual intelligence, yet scarce event data and the sparsity of DVS outputs hinder effective training. Prior knowledge transfers from RGB to DVS often underperform because the distribution gap between modalities is substantial. In this work, we present Time-step Mixup Knowledge Transfer (TMKT), a cross-modal training framework with a probabilistic Time-step Mixup (TSM) strategy. TSM exploits the asynchronous nature of SNNs by interpolating RGB and DVS inputs at various time steps to produce a smooth curriculum within each sequence, which reduces gradient variance and stabilizes optimization with theoretical analysis. To employ auxiliary supervision from TSM, TMKT introduces two lightweight modality-aware objectives, Modality Aware Guidance (MAG) for per-frame source supervision and Mixup Ratio Perception (MRP) for sequence-level mix ratio estimation, which explicitly align temporal features with the mixing schedule. TMKT enables smoother knowledge transfer, helps mitigate modality mismatch during training, and achieves superior performance in spiking image classification tasks. Extensive experiments across diverse benchmarks and multiple SNN backbones, together with ablations, demonstrate the effectiveness of our method.

Paper Structure

This paper contains 21 sections, 1 theorem, 44 equations, 4 figures, 5 tables.

Key Result

Theorem 3.2

By Assumption asm:batch, the two estimators share the same expectation: Their covariance matrices are Consequently, their difference can be written as

Figures (4)

  • Figure 1: Different paradigms for transferring static (RGB) knowledge to the event (DVS) domain.Finetuning on DVS data suffers from severe domain mismatch, causing the representations to hit the modality wall and fail to cross domains; Domain alignment alleviates this mismatch by explicitly pulling the DVS representations toward the RGB manifold; Our proposed Time-step Mixup (TSM) provides a smoother, causality-aligned transition across the modality wall, enabling stable cross-domain learning in the shared feature space.
  • Figure 2: The overview of our proposed Time-step Mixup Knowledge Transfer (TMKT) framework. TMKT employs a Time-step Mixup (TSM) strategy and introduces two auxiliary tasks: a modality-aware guidance label and a mixup ratio label to enhance the supervision of temporal knowledge transfer. Both the event stream and the Time-step Mixup stream are fed into the network simultaneously, sharing all weights except for the final layer. Membrane potentials from the penultimate layer are used for domain alignment.
  • Figure 3: Class Activation Mapping of N-Caltech101 (a)(b), CEP-DVS (c)(d), and their RGB counterparts. For each class, the top row shows static images, and the bottom row presents event data integrated into frames. Within each class, from left to right are: original input, baseline ekt result, and our result.
  • Figure 4: Visualization of the loss landscapes for our method and the baseline Knowledge-Transfer ekt.

Theorems & Definitions (2)

  • Theorem 3.2: Mean and covariance
  • Remark 3.3