Table of Contents
Fetching ...

GazeMotion: Gaze-guided Human Motion Forecasting

Zhiming Hu, Syn Schmitt, Daniel Haeufle, Andreas Bulling

TL;DR

GazeMotion addresses the challenge of forecasting human motion by leveraging eye gaze alongside historical poses. The approach first predicts future gaze from past gaze, then fuses predicted gaze with past poses into a gaze-pose graph, and finally employs a residual spatio-temporal graph convolutional network to forecast future poses. Across MoGaze, ADT, and GIMO, it achieves up to 7.4% MPJPE improvement over state-of-the-art methods and remains effective when gaze is unavailable by using head-direction proxies, with an online user study confirming improved realism. The work highlights the significant information content in gaze for motion forecasting and provides a practical framework for gaze-guided prediction in robotics and HCI contexts.

Abstract

We present GazeMotion, a novel method for human motion forecasting that combines information on past human poses with human eye gaze. Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion. We extensively evaluate our method on the MoGaze, ADT, and GIMO benchmark datasets and show that it outperforms state-of-the-art methods by up to 7.4% improvement in mean per joint position error. Using head direction as a proxy to gaze, our method still achieves an average improvement of 5.5%. We finally report an online user study showing that our method also outperforms prior methods in terms of perceived realism. These results show the significant information content available in eye gaze for human motion forecasting as well as the effectiveness of our method in exploiting this information.

GazeMotion: Gaze-guided Human Motion Forecasting

TL;DR

GazeMotion addresses the challenge of forecasting human motion by leveraging eye gaze alongside historical poses. The approach first predicts future gaze from past gaze, then fuses predicted gaze with past poses into a gaze-pose graph, and finally employs a residual spatio-temporal graph convolutional network to forecast future poses. Across MoGaze, ADT, and GIMO, it achieves up to 7.4% MPJPE improvement over state-of-the-art methods and remains effective when gaze is unavailable by using head-direction proxies, with an online user study confirming improved realism. The work highlights the significant information content in gaze for motion forecasting and provides a practical framework for gaze-guided prediction in robotics and HCI contexts.

Abstract

We present GazeMotion, a novel method for human motion forecasting that combines information on past human poses with human eye gaze. Inspired by evidence from behavioural sciences showing that human eye and body movements are closely coordinated, GazeMotion first predicts future eye gaze from past gaze, then fuses predicted future gaze and past poses into a gaze-pose graph, and finally uses a residual graph convolutional network to forecast body motion. We extensively evaluate our method on the MoGaze, ADT, and GIMO benchmark datasets and show that it outperforms state-of-the-art methods by up to 7.4% improvement in mean per joint position error. Using head direction as a proxy to gaze, our method still achieves an average improvement of 5.5%. We finally report an online user study showing that our method also outperforms prior methods in terms of perceived realism. These results show the significant information content available in eye gaze for human motion forecasting as well as the effectiveness of our method in exploiting this information.
Paper Structure (32 sections, 9 equations, 2 figures, 3 tables)

This paper contains 32 sections, 9 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Our method first forecasts future eye gaze from past gaze using a 1D convolutional neural network, then fuses the predicted gaze and past body poses into a gaze-pose graph, and finally applies a graph convolutional network consisting of a start, a residual, and an end module to forecast body motion.
  • Figure 2: Visualisation of the predictions of different methods on MoGaze kratzer2020mogaze. Our method consistently outperforms other methods when predicting 400ms and 800ms into the future.