Table of Contents
Fetching ...

EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition

Xu Zheng, Lin Wang

TL;DR

EventDance tackles the challenging problem of unsupervised, source-free cross-modal adaptation from images to event data. It introduces a Reconstruction-based Modality Bridging (RMB) module to synthesize surrogate image frames from events and a Multi-representation Knowledge Adaptation (MKA) module to transfer knowledge using multiple event representations, guided by losses that promote intra- and inter-modal consistency. By deriving pseudo labels from a pre-trained image model on an anchor surrogate and enforcing cross-representation and cross-modal alignment, EventDance achieves competitive accuracy on three event-based benchmarks without accessing source images, approaching performance of methods that do use source data. The framework also extends to edge-map to voxel-grid transfer, highlighting its flexibility for privacy-preserving, cross-modal vision tasks in event-based sensing.

Abstract

In this paper, we make the first attempt at achieving the cross-modal (i.e., image-to-events) adaptation for event-based object recognition without accessing any labeled source image data owning to privacy and commercial issues. Tackling this novel problem is non-trivial due to the novelty of event cameras and the distinct modality gap between images and events. In particular, as only the source model is available, a hurdle is how to extract the knowledge from the source model by only using the unlabeled target event data while achieving knowledge transfer. To this end, we propose a novel framework, dubbed EventDance for this unsupervised source-free cross-modal adaptation problem. Importantly, inspired by event-to-video reconstruction methods, we propose a reconstruction-based modality bridging (RMB) module, which reconstructs intensity frames from events in a self-supervised manner. This makes it possible to build up the surrogate images to extract the knowledge (i.e., labels) from the source model. We then propose a multi-representation knowledge adaptation (MKA) module that transfers the knowledge to target models learning events with multiple representation types for fully exploring the spatiotemporal information of events. The two modules connecting the source and target models are mutually updated so as to achieve the best performance. Experiments on three benchmark datasets with two adaption settings show that EventDance is on par with prior methods utilizing the source data.

EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition

TL;DR

EventDance tackles the challenging problem of unsupervised, source-free cross-modal adaptation from images to event data. It introduces a Reconstruction-based Modality Bridging (RMB) module to synthesize surrogate image frames from events and a Multi-representation Knowledge Adaptation (MKA) module to transfer knowledge using multiple event representations, guided by losses that promote intra- and inter-modal consistency. By deriving pseudo labels from a pre-trained image model on an anchor surrogate and enforcing cross-representation and cross-modal alignment, EventDance achieves competitive accuracy on three event-based benchmarks without accessing source images, approaching performance of methods that do use source data. The framework also extends to edge-map to voxel-grid transfer, highlighting its flexibility for privacy-preserving, cross-modal vision tasks in event-based sensing.

Abstract

In this paper, we make the first attempt at achieving the cross-modal (i.e., image-to-events) adaptation for event-based object recognition without accessing any labeled source image data owning to privacy and commercial issues. Tackling this novel problem is non-trivial due to the novelty of event cameras and the distinct modality gap between images and events. In particular, as only the source model is available, a hurdle is how to extract the knowledge from the source model by only using the unlabeled target event data while achieving knowledge transfer. To this end, we propose a novel framework, dubbed EventDance for this unsupervised source-free cross-modal adaptation problem. Importantly, inspired by event-to-video reconstruction methods, we propose a reconstruction-based modality bridging (RMB) module, which reconstructs intensity frames from events in a self-supervised manner. This makes it possible to build up the surrogate images to extract the knowledge (i.e., labels) from the source model. We then propose a multi-representation knowledge adaptation (MKA) module that transfers the knowledge to target models learning events with multiple representation types for fully exploring the spatiotemporal information of events. The two modules connecting the source and target models are mutually updated so as to achieve the best performance. Experiments on three benchmark datasets with two adaption settings show that EventDance is on par with prior methods utilizing the source data.
Paper Structure (16 sections, 6 equations, 7 figures, 7 tables)

This paper contains 16 sections, 6 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Illustration of the challenging task of the cross-modal adaptation from image to event modalities. We address it by introducing reconstruction-based modality bridging and multi-representation knowledge adaptation modules.
  • Figure 2: Different adaptation settings. (a) ours from image to event modalities. (b) SFUDA from different image types CTN.
  • Figure 3: Overall framework of our proposed framework. RMB: reconstruction-based modality bridging module, MKA: multi-representation knowledge adaptation module.
  • Figure 4: Visualization of samples in source and surrogate data in the image modality.
  • Figure 5: Illustration of the MKA module.
  • ...and 2 more figures