EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition
Xu Zheng, Lin Wang
TL;DR
EventDance tackles the challenging problem of unsupervised, source-free cross-modal adaptation from images to event data. It introduces a Reconstruction-based Modality Bridging (RMB) module to synthesize surrogate image frames from events and a Multi-representation Knowledge Adaptation (MKA) module to transfer knowledge using multiple event representations, guided by losses that promote intra- and inter-modal consistency. By deriving pseudo labels from a pre-trained image model on an anchor surrogate and enforcing cross-representation and cross-modal alignment, EventDance achieves competitive accuracy on three event-based benchmarks without accessing source images, approaching performance of methods that do use source data. The framework also extends to edge-map to voxel-grid transfer, highlighting its flexibility for privacy-preserving, cross-modal vision tasks in event-based sensing.
Abstract
In this paper, we make the first attempt at achieving the cross-modal (i.e., image-to-events) adaptation for event-based object recognition without accessing any labeled source image data owning to privacy and commercial issues. Tackling this novel problem is non-trivial due to the novelty of event cameras and the distinct modality gap between images and events. In particular, as only the source model is available, a hurdle is how to extract the knowledge from the source model by only using the unlabeled target event data while achieving knowledge transfer. To this end, we propose a novel framework, dubbed EventDance for this unsupervised source-free cross-modal adaptation problem. Importantly, inspired by event-to-video reconstruction methods, we propose a reconstruction-based modality bridging (RMB) module, which reconstructs intensity frames from events in a self-supervised manner. This makes it possible to build up the surrogate images to extract the knowledge (i.e., labels) from the source model. We then propose a multi-representation knowledge adaptation (MKA) module that transfers the knowledge to target models learning events with multiple representation types for fully exploring the spatiotemporal information of events. The two modules connecting the source and target models are mutually updated so as to achieve the best performance. Experiments on three benchmark datasets with two adaption settings show that EventDance is on par with prior methods utilizing the source data.
