Table of Contents
Fetching ...

MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient Calibration Module

Haoming Huang, Musen Zhang, Jianxin Yang, Zhen Li, Jinkai Li, Yao Guo

TL;DR

Proposes MAGE, a multi-task architecture that estimates complete $6$-DoF gaze from RGB facial images without screen-based calibration. It combines Easy-Norm normalization and Easy-Calibration calibration to achieve robust, subject-specific fine-tuning and cross-subject generalization, delivering state-of-the-art results on MPIIFaceGaze, EYEDIAP, and IMRGaze. The model outputs both gaze direction and PoGz, leveraging multi-task constraints to improve consistency between directional, positional, and normalization components. This work enables practical, device-independent 3D gaze analysis for real-world HRI applications with efficient calibration.

Abstract

Eye gaze can provide rich information on human psychological activities, and has garnered significant attention in the field of Human-Robot Interaction (HRI). However, existing gaze estimation methods merely predict either the gaze direction or the Point-of-Gaze (PoG) on the screen, failing to provide sufficient information for a comprehensive six Degree-of-Freedom (DoF) gaze analysis in 3D space. Moreover, the variations of eye shape and structure among individuals also impede the generalization capability of these methods. In this study, we propose MAGE, a Multi-task Architecture for Gaze Estimation with an efficient calibration module, to predict the 6-DoF gaze information that is applicable for the real-word HRI. Our basic model encodes both the directional and positional features from facial images, and predicts gaze results with dedicated information flow and multiple decoders. To reduce the impact of individual variations, we propose a novel calibration module, namely Easy-Calibration, to fine-tune the basic model with subject-specific data, which is efficient to implement without the need of a screen. Experimental results demonstrate that our method achieves state-of-the-art performance on the public MPIIFaceGaze, EYEDIAP, and our built IMRGaze datasets.

MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient Calibration Module

TL;DR

Proposes MAGE, a multi-task architecture that estimates complete -DoF gaze from RGB facial images without screen-based calibration. It combines Easy-Norm normalization and Easy-Calibration calibration to achieve robust, subject-specific fine-tuning and cross-subject generalization, delivering state-of-the-art results on MPIIFaceGaze, EYEDIAP, and IMRGaze. The model outputs both gaze direction and PoGz, leveraging multi-task constraints to improve consistency between directional, positional, and normalization components. This work enables practical, device-independent 3D gaze analysis for real-world HRI applications with efficient calibration.

Abstract

Eye gaze can provide rich information on human psychological activities, and has garnered significant attention in the field of Human-Robot Interaction (HRI). However, existing gaze estimation methods merely predict either the gaze direction or the Point-of-Gaze (PoG) on the screen, failing to provide sufficient information for a comprehensive six Degree-of-Freedom (DoF) gaze analysis in 3D space. Moreover, the variations of eye shape and structure among individuals also impede the generalization capability of these methods. In this study, we propose MAGE, a Multi-task Architecture for Gaze Estimation with an efficient calibration module, to predict the 6-DoF gaze information that is applicable for the real-word HRI. Our basic model encodes both the directional and positional features from facial images, and predicts gaze results with dedicated information flow and multiple decoders. To reduce the impact of individual variations, we propose a novel calibration module, namely Easy-Calibration, to fine-tune the basic model with subject-specific data, which is efficient to implement without the need of a screen. Experimental results demonstrate that our method achieves state-of-the-art performance on the public MPIIFaceGaze, EYEDIAP, and our built IMRGaze datasets.

Paper Structure

This paper contains 21 sections, 10 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The overview of MAGE, our proposed multi-task network architecture for human gaze estimation. The model takes the normalized RGB image with facial bounding box provided by the Easy-Norm module as input, and predicts the 6-DoF gaze with complete directional and positional information. The Easy-Calibration module efficiently supplies calibration data for fine-tuning MAGE.
  • Figure 2: The pipeline of Easy-Norm, involving two steps: (1) Standardize the camera parameters, and transform the image and facial bounding box accordingly. (2) Aligning the oCCS by rotating the z-axis toward the face center ; $\bm{g}_n$ is derived by inversely rotating the $\bm{g}_o$ accordingly.
  • Figure 3: The definition of PoGz is the intersection point of the gaze vector $\bm{g}_o$ and the $XY$-plane of the CCS. Then, the coordinates of PoG in the SCS can be derived from PoGz.
  • Figure 4: The pipeline of the Easy-Calibration module. Subjects gaze at the camera lens center and move their head to capture multi-angle images. The ground truth of gaze vector $\bm{g}_o$ on these images can be derived as the unit vector pointing from the face center projection to the lens center in the CCS, which is then utilized to fine-tune the basic model.
  • Figure 5: Illustration of IMRGaze dataset. (a) General data in IMRGaze are collected under three static head poses (middle, left, and right) and free movement. (b) Calibration data in IMRGaze, in which the subjects gaze at camera lens center while moving the head freely.
  • ...and 1 more figures