Table of Contents
Fetching ...

Motion Segmentation for Neuromorphic Aerial Surveillance

Sami Arja, Alexandre Marcireau, Saeed Afshar, Bharath Ramesh, Gregory Cohen

TL;DR

This paper introduces a novel motion segmentation method that leverages self-supervised vision transformers on both event data and optical flow information and eliminates the need for human annotations and reduces dependency on scene-specific parameters.

Abstract

Aerial surveillance demands rapid and precise detection of moving objects in dynamic environments. Event cameras, which draw inspiration from biological vision systems, present a promising alternative to frame-based sensors due to their exceptional temporal resolution, superior dynamic range, and minimal power requirements. Unlike traditional frame-based sensors that capture redundant information at fixed intervals, event cameras asynchronously record pixel-level brightness changes, providing a continuous and efficient data stream ideal for fast motion segmentation. While these sensors are ideal for fast motion segmentation, existing event-based motion segmentation methods often suffer from limitations such as the need for per-scene parameter tuning or reliance on manual labelling, hindering their scalability and practical deployment. In this paper, we address these challenges by introducing a novel motion segmentation method that leverages self-supervised vision transformers on both event data and optical flow information. Our approach eliminates the need for human annotations and reduces dependency on scene-specific parameters. In this paper, we used the EVK4-HD Prophesee event camera onboard a highly dynamic aerial platform in urban settings. We conduct extensive evaluations of our framework across multiple datasets, demonstrating state-of-the-art performance compared to existing benchmarks. Our method can effectively handle various types of motion and an arbitrary number of moving objects. Code and dataset are available at: \url{https://samiarja.github.io/evairborne/}

Motion Segmentation for Neuromorphic Aerial Surveillance

TL;DR

This paper introduces a novel motion segmentation method that leverages self-supervised vision transformers on both event data and optical flow information and eliminates the need for human annotations and reduces dependency on scene-specific parameters.

Abstract

Aerial surveillance demands rapid and precise detection of moving objects in dynamic environments. Event cameras, which draw inspiration from biological vision systems, present a promising alternative to frame-based sensors due to their exceptional temporal resolution, superior dynamic range, and minimal power requirements. Unlike traditional frame-based sensors that capture redundant information at fixed intervals, event cameras asynchronously record pixel-level brightness changes, providing a continuous and efficient data stream ideal for fast motion segmentation. While these sensors are ideal for fast motion segmentation, existing event-based motion segmentation methods often suffer from limitations such as the need for per-scene parameter tuning or reliance on manual labelling, hindering their scalability and practical deployment. In this paper, we address these challenges by introducing a novel motion segmentation method that leverages self-supervised vision transformers on both event data and optical flow information. Our approach eliminates the need for human annotations and reduces dependency on scene-specific parameters. In this paper, we used the EVK4-HD Prophesee event camera onboard a highly dynamic aerial platform in urban settings. We conduct extensive evaluations of our framework across multiple datasets, demonstrating state-of-the-art performance compared to existing benchmarks. Our method can effectively handle various types of motion and an arbitrary number of moving objects. Code and dataset are available at: \url{https://samiarja.github.io/evairborne/}
Paper Structure (17 sections, 5 equations, 13 figures, 8 tables)

This paper contains 17 sections, 5 equations, 13 figures, 8 tables.

Figures (13)

  • Figure 2: Overview of the event-based motion segmentation architecture. Our method performs a pixel-wise motion segmentation in two consecutive stages. Firstly, the incoming event stream is transformed into a frame, which, along with the generated optical flow from RAFT teed2020raft, feeds into a self-supervised DINO caron_emerging_2021. DINO extracts a series of feature vectors, after which a dynamic refinement strategy fine-tunes the mask predictions. Secondly, events within the predicted mask are isolated, using CMax algorithm gallego_unifying_2018 combined with a blur detection golestaneh2017spatially to assign both continuous motion label and a discrete label to each event.
  • Figure 3: Illustration of the DMR method random_walk_neurips2020xie2022segmenting, showing its role in maintaining temporal consistency and addressing issues with noisy optical flow. It handles scenarios where masks are either absent or irregular (in red boxes), ensuring that these inconsistencies do not disrupt the motion compensation process over time.
  • Figure 4: Combined qualitative results on EED mitrokhin_event-based_2018 (red bounding boxes are ground truth), EV-IMO mitrokhin_ev-imo_2019, EV-IMO2 EVIMO2, DistSurf almatrafi_distance_2020, and HKUST-EMS zhou_event-based_2021 datasets.
  • Figure 5: Comparing the motion segmentation output of our method against EMSGC zhou_event-based_2021 on the Ev-Airborne dataset. From left to right, Rows 1,2: Moving SUV low oblique, Moving SUV/Dome, Golf car high oblique, small cars high oblique, airplane takeoff1. Rows 3,4: Small car high oblique, Pedestrians, moving car low oblique, big cars low oblique, airplane takeoff 2. This demonstrates that our approach does not over-segment the scene and it assigns the correct labels for the most salient objects with a single label for the background.
  • Figure 6: Qualitative results from the DMR process showing its temporal consistency overtime. The red bounding boxes highlight the frames where the DMR successfully recover the salient mask of the moving objects. SM refers to salient masks.
  • ...and 8 more figures