Table of Contents
Fetching ...

Wandering around: A bioinspired approach to visual attention through object motion sensitivity

Giulia D Angelo, Victoria Clerico, Chiara Bartolozzi, Matej Hoffmann, P. Michael Furlong, Alexander Hadjiivanov

TL;DR

The paper addresses real-time, low-power visual perception in dynamic environments by proposing an end-to-end bioinspired attention system that leverages event-based sensing and neuromorphic computation. It combines a Spiking Object Motion Sensitivity (sOMS) module with a Spiking Neural Network proto-object model to produce a saliency map, which guides a Spiking Attention Control to perform saccades toward salient objects, with fixational eye movements to reveal the next focal point. The approach is learning-free and hardware-oriented, demonstrated on the Speck neuromorphic platform with a Pan-Tilt Unit, achieving mean IoU 82.2% and SSIM 96% on EVIMO, and object detection accuracies around 89% in office and low-light scenarios, all with a real-time ~0.12 s response. This work demonstrates robust motion segmentation and attention in diverse conditions, offering a foundation for fully neuromorphic, real-time robotic perception without large training datasets and with potential for end-to-end hardware deployment.

Abstract

Active vision enables dynamic visual perception, offering an alternative to static feedforward architectures in computer vision, which rely on large datasets and high computational resources. Biological selective attention mechanisms allow agents to focus on salient Regions of Interest (ROIs), reducing computational demand while maintaining real-time responsiveness. Event-based cameras, inspired by the mammalian retina, enhance this capability by capturing asynchronous scene changes enabling efficient low-latency processing. To distinguish moving objects while the event-based camera is in motion the agent requires an object motion segmentation mechanism to accurately detect targets and center them in the visual field (fovea). Integrating event-based sensors with neuromorphic algorithms represents a paradigm shift, using Spiking Neural Networks to parallelize computation and adapt to dynamic environments. This work presents a Spiking Convolutional Neural Network bioinspired attention system for selective attention through object motion sensitivity. The system generates events via fixational eye movements using a Dynamic Vision Sensor integrated into the Speck neuromorphic hardware, mounted on a Pan-Tilt unit, to identify the ROI and saccade toward it. The system, characterized using ideal gratings and benchmarked against the Event Camera Motion Segmentation Dataset, reaches a mean IoU of 82.2% and a mean SSIM of 96% in multi-object motion segmentation. The detection of salient objects reaches 88.8% accuracy in office scenarios and 89.8% in low-light conditions on the Event-Assisted Low-Light Video Object Segmentation Dataset. A real-time demonstrator shows the system's 0.12 s response to dynamic scenes. Its learning-free design ensures robustness across perceptual scenes, making it a reliable foundation for real-time robotic applications serving as a basis for more complex architectures.

Wandering around: A bioinspired approach to visual attention through object motion sensitivity

TL;DR

The paper addresses real-time, low-power visual perception in dynamic environments by proposing an end-to-end bioinspired attention system that leverages event-based sensing and neuromorphic computation. It combines a Spiking Object Motion Sensitivity (sOMS) module with a Spiking Neural Network proto-object model to produce a saliency map, which guides a Spiking Attention Control to perform saccades toward salient objects, with fixational eye movements to reveal the next focal point. The approach is learning-free and hardware-oriented, demonstrated on the Speck neuromorphic platform with a Pan-Tilt Unit, achieving mean IoU 82.2% and SSIM 96% on EVIMO, and object detection accuracies around 89% in office and low-light scenarios, all with a real-time ~0.12 s response. This work demonstrates robust motion segmentation and attention in diverse conditions, offering a foundation for fully neuromorphic, real-time robotic perception without large training datasets and with potential for end-to-end hardware deployment.

Abstract

Active vision enables dynamic visual perception, offering an alternative to static feedforward architectures in computer vision, which rely on large datasets and high computational resources. Biological selective attention mechanisms allow agents to focus on salient Regions of Interest (ROIs), reducing computational demand while maintaining real-time responsiveness. Event-based cameras, inspired by the mammalian retina, enhance this capability by capturing asynchronous scene changes enabling efficient low-latency processing. To distinguish moving objects while the event-based camera is in motion the agent requires an object motion segmentation mechanism to accurately detect targets and center them in the visual field (fovea). Integrating event-based sensors with neuromorphic algorithms represents a paradigm shift, using Spiking Neural Networks to parallelize computation and adapt to dynamic environments. This work presents a Spiking Convolutional Neural Network bioinspired attention system for selective attention through object motion sensitivity. The system generates events via fixational eye movements using a Dynamic Vision Sensor integrated into the Speck neuromorphic hardware, mounted on a Pan-Tilt unit, to identify the ROI and saccade toward it. The system, characterized using ideal gratings and benchmarked against the Event Camera Motion Segmentation Dataset, reaches a mean IoU of 82.2% and a mean SSIM of 96% in multi-object motion segmentation. The detection of salient objects reaches 88.8% accuracy in office scenarios and 89.8% in low-light conditions on the Event-Assisted Low-Light Video Object Segmentation Dataset. A real-time demonstrator shows the system's 0.12 s response to dynamic scenes. Its learning-free design ensures robustness across perceptual scenes, making it a reliable foundation for real-time robotic applications serving as a basis for more complex architectures.

Paper Structure

This paper contains 19 sections, 16 equations, 13 figures, 3 tables, 1 algorithm.

Figures (13)

  • Figure 1: Overview of the system: From left to right, events from the Dynamic Vision Sensor (DVS) integrated into the Speck device are processed from left to right. These events enter the object motion sensitivity module, where they are processed by the spiking Object Motion Sensitive (sOMS) model. This model generates the Object Motion Segmentation (OMS) map, which is then fed into the proto-object detection module. The map is further processed by the Spiking Neural Network (SNN) Proto-object (SNN Proto-object) model, producing the final saliency map and identifying the most salient object (red circle, P($x,y$)). This triggers the Spiking Attention Control (sAC) mechanism in the visual attention module, which generates pan and tilt control signals ($u_{pan}, u_{tilt}$) for the saccadic movement toward the salient object, using a FLIR Pan-Tilt Unit (PTU). The system then performs fixational eye movements to identify the next salient point and close the loop. See the accompanying video for a demonstration of the system.
  • Figure 2: View of the (a) center and (b) surround Gaussian kernels of the Object Motion Sensitivity (OMS) model.
  • Figure 3: From left to right: Experiment, Parameters, RGB stimuli, Event map and OMS map in three different situations: Eye+object, Eye only and Object only. Where ${fs}$ is the spatial frequency and ${s}$ is the speed in cycles per frame of the moving grating. The figure shows the sOMS model enhancing object motion, where white and black represent positive and negative polarities on the Event map, and white indicates the spikes of the sOMS model on the sOMS map.
  • Figure 4: Average Mean firing rate (MFR) and mean inter-spike interval (ISI) for the Eye+Object case under two scenarios: (a) varying center ($\sigma_{c}$) and surround ($\sigma_{s}$) values of the kernels (pixels) [$\sigma_{c}=1,\sigma_{s}=4$; $\sigma_{c}=2,\sigma_{s}=4$; $\sigma_{c}=3,\sigma_{s}=4$; $\sigma_{c}=4,\sigma_s=4$; $\sigma_{c}=2,\sigma_{s}=8$; $\sigma_{c}=4,\sigma_{s}=8$] (Figure \ref{['fig:OMSkernel']}), and (b) varying both $\sigma$ values and kernel sizes [$\sigma_{c}=1,\sigma_{s}=4$, s=8; $\sigma_{c}=4,\sigma_{s}=8$, s=16; $\sigma_{c}=8,\sigma_{s}=16$, s=32].
  • Figure 5: From left to right: Experiment, Parameters, RGB stimuli, Event map and OMS map for the Eye+object case with different spatial frequencies of the background and the foreground. Where ${fs}$ is the spatial frequency and ${s}$ is the speed in cycles per frame of the moving grating. The colors shown on the maps are the same as those in Figure \ref{['fig:OMSchar']}.
  • ...and 8 more figures