Predictive Temporal Attention on Event-based Video Stream for Energy-efficient Situation Awareness
Yiming Bu, Jiayang Liu, Qinru Qiu
TL;DR
The paper tackles energy efficiency in DVS-based vision by gating camera output through predictive temporal attention. It introduces an SNN-ANN hybrid autoencoder predictor paired with an evaluator-based gating mechanism, and formalizes an Event Similarity Esim metric (Esim(F1,F2) = |F1 ∩ F2| / |F1 ∪ F2|) to quantify prediction quality; Region Esim extends this to tolerate noise and shifts. Empirically, the approach reduces data communication by 46.7% and computation by 43.8% while maintaining situation awareness, with the predictor effectively filtering noise and the evaluator-guided gating adapting to prediction quality. The method is validated across multiple datasets, demonstrating energy savings and robustness in event-based perception systems.
Abstract
The Dynamic Vision Sensor (DVS) is an innovative technology that efficiently captures and encodes visual information in an event-driven manner. By combining it with event-driven neuromorphic processing, the sparsity in DVS camera output can result in high energy efficiency. However, similar to many embedded systems, the off-chip communication between the camera and processor presents a bottleneck in terms of power consumption. Inspired by the predictive coding model and expectation suppression phenomenon found in human brain, we propose a temporal attention mechanism to throttle the camera output and pay attention to it only when the visual events cannot be well predicted. The predictive attention not only reduces power consumption in the sensor-processor interface but also effectively decreases the computational workload by filtering out noisy events. We demonstrate that the predictive attention can reduce 46.7% of data communication between the camera and the processor and reduce 43.8% computation activities in the processor.
