GraphEnet: Event-driven Human Pose Estimation with a Graph Neural Network
Gaurvi Goyal, Pham Cong Thuong, Arren Glover, Masayoshi Mizuno, Chiara Bartolozzi
TL;DR
GraphEnet introduces a novel graph neural network framework for 2D human pose estimation from asynchronous event camera data. It builds a sparse input graph using a line-based intermediate representation (SCARF) and processes it with stacked SplineConv layers, culminating in a confidence-weighted pooling mechanism to predict joint positions. The approach achieves real-time performance (≈250 Hz, ~4 ms latency) with a modest accuracy trade-off relative to RGB-based methods, validated on eH36M and DHP19 datasets, and supported by extensive ablations. The work demonstrates the potential of sparse event-based graphs to deliver high-frequency pose estimation with reduced computational cost and energy, highlighting avenues for future hierarchical or asynchronous graph updates.
Abstract
Human Pose Estimation is a crucial module in human-machine interaction applications and, especially since the rise in deep learning technology, robust methods are available to consumers using RGB cameras and commercial GPUs. On the other hand, event-based cameras have gained popularity in the vision research community for their low latency and low energy advantages that make them ideal for applications where those resources are constrained like portable electronics and mobile robots. In this work we propose a Graph Neural Network, GraphEnet, that leverages the sparse nature of event camera output, with an intermediate line based event representation, to estimate 2D Human Pose of a single person at a high frequency. The architecture incorporates a novel offset vector learning paradigm with confidence based pooling to estimate the human pose. This is the first work that applies Graph Neural Networks to event data for Human Pose Estimation. The code is open-source at https://github.com/event-driven-robotics/GraphEnet-NeVi-ICCV2025.
