Table of Contents
Fetching ...

Hypergraph-based Multi-View Action Recognition using Event Cameras

Yue Gao, Jiaxuan Lu, Siqi Li, Yipeng Li, Shaoyi Du

TL;DR

Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.

Abstract

Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, a multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset $\text{THU}^{\text{MV-EACT}}\text{-50}$, comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.

Hypergraph-based Multi-View Action Recognition using Event Cameras

TL;DR

Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.

Abstract

Action recognition from video data forms a cornerstone with wide-ranging applications. Single-view action recognition faces limitations due to its reliance on a single viewpoint. In contrast, multi-view approaches capture complementary information from various viewpoints for improved accuracy. Recently, event cameras have emerged as innovative bio-inspired sensors, leading to advancements in event-based action recognition. However, existing works predominantly focus on single-view scenarios, leaving a gap in multi-view event data exploitation, particularly in challenges like information deficit and semantic misalignment. To bridge this gap, we introduce HyperMV, a multi-view event-based action recognition framework. HyperMV converts discrete event data into frame-like representations and extracts view-related features using a shared convolutional network. By treating segments as vertices and constructing hyperedges using rule-based and KNN-based strategies, a multi-view hypergraph neural network that captures relationships across viewpoint and temporal features is established. The vertex attention hypergraph propagation is also introduced for enhanced feature fusion. To prompt research in this area, we present the largest multi-view event-based action dataset , comprising 50 actions from 6 viewpoints, which surpasses existing datasets by over tenfold. Experimental results show that HyperMV significantly outperforms baselines in both cross-subject and cross-view scenarios, and also exceeds the state-of-the-arts in frame-based multi-view action recognition.
Paper Structure (23 sections, 10 equations, 8 figures, 7 tables)

This paper contains 23 sections, 10 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Addressing information deficit and semantic misalignment in multi-view event-based action recognition, the proposed multi-view hypergraph neural network leverages rule-based and KNN-based hyperedges to correlate features across views and temporal segments.
  • Figure 2: The pipeline of the proposed multi-view event-based action recognition framework, including Event Processing, View Feature Extraction, Multi-View Hypergraph Neural Network, and Action Prediction.
  • Figure 3: Rule-based and KNN-based hyperedges are combined to model explicit and implicit associations of features.
  • Figure 4: Vertex attention hypergraph propagation via the vertex-hyperedge-vertex process: (1) Initial vertex features enter a fully connected layer for feature extraction. (2) Hyperedge features are then weighted from these vertex features. (3) Subsequently, these hyperedge features are fused together to formulate the vertex features for the next layer.
  • Figure 5: Examples in the collected $\text{THU}^{\text{MV-EACT}}\text{-50}$ dataset.
  • ...and 3 more figures