Table of Contents
Fetching ...

Vector-Symbolic Architecture for Event-Based Optical Flow

Hongzhi You, Yijun Cao, Wei Yuan, Fanjun Wang, Ning Qiao, Yongjie Li

TL;DR

This work addresses robust optical flow estimation for event cameras by developing a high-dimensional feature descriptor based on Vector Symbolic Architectures (VSA). It introduces a VSA-based HD feature descriptor that fuses information across event polarities and multiple spatial scales, and proposes two methods: VSA-Flow (model-based) and VSA-SM (self-supervised). Empirical results on the DSEC-Flow and MVSEC benchmarks show state-of-the-art performance for the model-based approach and strong results for self-supervised learning, validating the descriptor’s effectiveness and flexibility in event-driven motion estimation. By reframing flow estimation as a feature-matching problem over HD, symbolic, multi-scale representations, the paper offers a scalable path toward accurate, grayscale-free optical flow on event cameras with potential extensions to other neuromorphic vision tasks.

Abstract

From a perspective of feature matching, optical flow estimation for event cameras involves identifying event correspondences by comparing feature similarity across accompanying event frames. In this work, we introduces an effective and robust high-dimensional (HD) feature descriptor for event frames, utilizing Vector Symbolic Architectures (VSA). The topological similarity among neighboring variables within VSA contributes to the enhanced representation similarity of feature descriptors for flow-matching points, while its structured symbolic representation capacity facilitates feature fusion from both event polarities and multiple spatial scales. Based on this HD feature descriptor, we propose a novel feature matching framework for event-based optical flow, encompassing both model-based (VSA-Flow) and self-supervised learning (VSA-SM) methods. In VSA-Flow, accurate optical flow estimation validates the effectiveness of HD feature descriptors. In VSA-SM, a novel similarity maximization method based on the HD feature descriptor is proposed to learn optical flow in a self-supervised way from events alone, eliminating the need for auxiliary grayscale images. Evaluation results demonstrate that our VSA-based method achieves superior accuracy in comparison to both model-based and self-supervised learning methods on the DSEC benchmark, while remains competitive among both methods on the MVSEC benchmark. This contribution marks a significant advancement in event-based optical flow within the feature matching methodology.

Vector-Symbolic Architecture for Event-Based Optical Flow

TL;DR

This work addresses robust optical flow estimation for event cameras by developing a high-dimensional feature descriptor based on Vector Symbolic Architectures (VSA). It introduces a VSA-based HD feature descriptor that fuses information across event polarities and multiple spatial scales, and proposes two methods: VSA-Flow (model-based) and VSA-SM (self-supervised). Empirical results on the DSEC-Flow and MVSEC benchmarks show state-of-the-art performance for the model-based approach and strong results for self-supervised learning, validating the descriptor’s effectiveness and flexibility in event-driven motion estimation. By reframing flow estimation as a feature-matching problem over HD, symbolic, multi-scale representations, the paper offers a scalable path toward accurate, grayscale-free optical flow on event cameras with potential extensions to other neuromorphic vision tasks.

Abstract

From a perspective of feature matching, optical flow estimation for event cameras involves identifying event correspondences by comparing feature similarity across accompanying event frames. In this work, we introduces an effective and robust high-dimensional (HD) feature descriptor for event frames, utilizing Vector Symbolic Architectures (VSA). The topological similarity among neighboring variables within VSA contributes to the enhanced representation similarity of feature descriptors for flow-matching points, while its structured symbolic representation capacity facilitates feature fusion from both event polarities and multiple spatial scales. Based on this HD feature descriptor, we propose a novel feature matching framework for event-based optical flow, encompassing both model-based (VSA-Flow) and self-supervised learning (VSA-SM) methods. In VSA-Flow, accurate optical flow estimation validates the effectiveness of HD feature descriptors. In VSA-SM, a novel similarity maximization method based on the HD feature descriptor is proposed to learn optical flow in a self-supervised way from events alone, eliminating the need for auxiliary grayscale images. Evaluation results demonstrate that our VSA-based method achieves superior accuracy in comparison to both model-based and self-supervised learning methods on the DSEC benchmark, while remains competitive among both methods on the MVSEC benchmark. This contribution marks a significant advancement in event-based optical flow within the feature matching methodology.
Paper Structure (28 sections, 18 equations, 6 figures, 4 tables)

This paper contains 28 sections, 18 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Topological similarity in 2-D space for basic VSA and VFA HD kernels. Similarity between hypervectors in the basic VSA (a) and VFA (b) HD kernels, respectively, originating from the center and surrounding points. The comparative analysis of hypervectors between the origin ($D(0,0)$) and points in its surrounding $N \times N$ neighborhood indicates that the VFA HD kernel, rather than the basic VSA HD kernel, is capable of capturing spatial topological similarity. $N=21$ ($n = \lfloor N/2 \rfloor = 10$).
  • Figure 2: Schematic of proposed VSA-Flow method for event-based optical flow.(a) Illustration of acquiring HD feature descriptors from accumulative TSs in a multi-scale strategy. (b) The VSA-Flow method consists of HD feature extractors, a cost volume module, and an optical flow estimator. HD feature extractors capture HD feature descriptors from TSs. The cost volume module computes local visual similarity by forming a volume representing similarity between $3$ TS pairs with different time intervals at different scales. The optical flow estimator generates flow using local visual similarity. (c) The mechanism allows for the direct fusion of three cost volumes at different scales through summation to form the final local visual similarity within the cost volume module.
  • Figure 3: Self-supervised optical flow learning via similarity maximization based on HD Feature descriptors.(a) Multi-frame approach for flow refinement. Within the time interval $\Delta t$, we utilize $K=5$ pairs of HD feature descriptors ($F^0\rightarrow F^k$,$k=1,...,K$) with progressively incremented intervals to compute the similarity between events and their corresponding matching points, ultimately enhancing the accuracy of optical flow estimation. (b) Illustration of similarity calculation for HD descriptors between events and their predicted flow-matching points.
  • Figure 4: The probability density of similarity between matching points based on ground-truth (GT) optical flow on two datasets. For each dataset, we can compute the similarity of HD feature descriptors for $N_{match}$ pairs of flow-matching points according to GT. The probability density of similarity refers to the likelihood that the feature similarity of flow-matching points equals a certain value. Compared to the basic VSA, VFA demonstrates enhanced capability in encoding the similarity of matching points in event frames.
  • Figure 5: Qualitative comparision of our methods with the state-of-the-art E-RAFT architecture on several test sequence partitions of the DSEC datasetgehrig2021raft.
  • ...and 1 more figures