EvRepSL: Event-Stream Representation via Self-Supervised Learning for Event-Based Vision
Qiang Qu, Xiaoming Chen, Yuk Ying Chung, Yiran Shen
TL;DR
This paper introduces EvRepSL, a high-quality, self-supervised event-stream representation for event-based vision. It starts with EvRep, a three-channel spatial–temporal representation, and derives a theoretical link between asynchronous events and synchronous frames to enable refinement via RepGen, which outputs EvRepSL without task-specific retraining. Through extensive experiments on classification and optical flow, EvRepSL substantially outperforms existing representations while remaining agnostic to the camera type and downstream task. The approach delivers practical gains in accuracy and efficiency, establishing EvRepSL as a versatile foundation for future event-based vision systems.
Abstract
Event-stream representation is the first step for many computer vision tasks using event cameras. It converts the asynchronous event-streams into a formatted structure so that conventional machine learning models can be applied easily. However, most of the state-of-the-art event-stream representations are manually designed and the quality of these representations cannot be guaranteed due to the noisy nature of event-streams. In this paper, we introduce a data-driven approach aiming at enhancing the quality of event-stream representations. Our approach commences with the introduction of a new event-stream representation based on spatial-temporal statistics, denoted as EvRep. Subsequently, we theoretically derive the intrinsic relationship between asynchronous event-streams and synchronous video frames. Building upon this theoretical relationship, we train a representation generator, RepGen, in a self-supervised learning manner accepting EvRep as input. Finally, the event-streams are converted to high-quality representations, termed as EvRepSL, by going through the learned RepGen (without the need of fine-tuning or retraining). Our methodology is rigorously validated through extensive evaluations on a variety of mainstream event-based classification and optical flow datasets (captured with various types of event cameras). The experimental results highlight not only our approach's superior performance over existing event-stream representations but also its versatility, being agnostic to different event cameras and tasks.
