GazeMotion: Gaze-guided Human Motion Forecasting
Zhiming Hu, Syn Schmitt, Daniel Haeufle, Andreas Bulling
TL;DR
GazeMotion addresses the challenge of forecasting human motion by leveraging eye gaze alongside historical poses. The approach first predicts future gaze from past gaze, then fuses predicted gaze with past poses into a gaze-pose graph, and finally employs a residual spatio-temporal graph convolutional network to forecast future poses. Across MoGaze, ADT, and GIMO, it achieves up to 7.4% MPJPE improvement over state-of-the-art methods and remains effective when gaze is unavailable by using head-direction proxies, with an online user study confirming improved realism. The work highlights the significant information content in gaze for motion forecasting and provides a practical framework for gaze-guided prediction in robotics and HCI contexts.
Abstract
We present GazeMotion, a novel method for human motion forecasting that combines information on past human poses with human eye gaze. Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion. We extensively evaluate our method on the MoGaze, ADT, and GIMO benchmark datasets and show that it outperforms state-of-the-art methods by up to 7.4% improvement in mean per joint position error. Using head direction as a proxy to gaze, our method still achieves an average improvement of 5.5%. We finally report an online user study showing that our method also outperforms prior methods in terms of perceived realism. These results show the significant information content available in eye gaze for human motion forecasting as well as the effectiveness of our method in exploiting this information.
