Continuous-Time Human Motion Field from Events
Ziyun Wang, Ruijun Zhang, Zi-Yan Liu, Yufu Wang, Kostas Daniilidis
TL;DR
This work introduces EvHuman, the first method to predict a continuous-time human motion field directly from event streams by leveraging a neural motion prior and a time-continuous decoder, enabling pose queries at arbitrary timestamps with parallel inference. It combines a GRU-based event predictor, NeMF-based motion priors, and a differentiable, event-contrastive supervision signal to jointly estimate local SMPL poses and global motion, while avoiding the computational bottlenecks of discrete-pose optimization. The authors demonstrate superior accuracy and significantly faster inference than prior event-based methods on MMHPSD and the new BEAHM dataset, which provides hardware-synchronized, high-frame-rate ground truth at 120 FPS. The BEAHM dataset, along with the proposed losses and training scheme, enables robust evaluation of high-speed human motion under various lighting conditions and motions, highlighting EvHuman’s practical impact for real-time, high-fidelity motion capture from events.
Abstract
This paper addresses the challenges of estimating a continuous-time human motion field from a stream of events. Existing Human Mesh Recovery (HMR) methods rely predominantly on frame-based approaches, which are prone to aliasing and inaccuracies due to limited temporal resolution and motion blur. In this work, we predict a continuous-time human motion field directly from events by leveraging a recurrent feed-forward neural network to predict human motion in the latent space of possible human motions. Prior state-of-the-art event-based methods rely on computationally intensive optimization across a fixed number of poses at high frame rates, which becomes prohibitively expensive as we increase the temporal resolution. In comparison, we present the first work that replaces traditional discrete-time predictions with a continuous human motion field represented as a time-implicit function, enabling parallel pose queries at arbitrary temporal resolutions. Despite the promises of event cameras, few benchmarks have tested the limit of high-speed human motion estimation. We introduce Beam-splitter Event Agile Human Motion Dataset-a hardware-synchronized high-speed human dataset to fill this gap. On this new data, our method improves joint errors by 23.8% compared to previous event human methods while reducing the computational time by 69%.
