MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient Calibration Module

Haoming Huang; Musen Zhang; Jianxin Yang; Zhen Li; Jinkai Li; Yao Guo

MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient Calibration Module

Haoming Huang, Musen Zhang, Jianxin Yang, Zhen Li, Jinkai Li, Yao Guo

TL;DR

Proposes MAGE, a multi-task architecture that estimates complete $6$-DoF gaze from RGB facial images without screen-based calibration. It combines Easy-Norm normalization and Easy-Calibration calibration to achieve robust, subject-specific fine-tuning and cross-subject generalization, delivering state-of-the-art results on MPIIFaceGaze, EYEDIAP, and IMRGaze. The model outputs both gaze direction and PoGz, leveraging multi-task constraints to improve consistency between directional, positional, and normalization components. This work enables practical, device-independent 3D gaze analysis for real-world HRI applications with efficient calibration.

Abstract

Eye gaze can provide rich information on human psychological activities, and has garnered significant attention in the field of Human-Robot Interaction (HRI). However, existing gaze estimation methods merely predict either the gaze direction or the Point-of-Gaze (PoG) on the screen, failing to provide sufficient information for a comprehensive six Degree-of-Freedom (DoF) gaze analysis in 3D space. Moreover, the variations of eye shape and structure among individuals also impede the generalization capability of these methods. In this study, we propose MAGE, a Multi-task Architecture for Gaze Estimation with an efficient calibration module, to predict the 6-DoF gaze information that is applicable for the real-word HRI. Our basic model encodes both the directional and positional features from facial images, and predicts gaze results with dedicated information flow and multiple decoders. To reduce the impact of individual variations, we propose a novel calibration module, namely Easy-Calibration, to fine-tune the basic model with subject-specific data, which is efficient to implement without the need of a screen. Experimental results demonstrate that our method achieves state-of-the-art performance on the public MPIIFaceGaze, EYEDIAP, and our built IMRGaze datasets.

MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient Calibration Module

TL;DR

Abstract

MAGE: A Multi-task Architecture for Gaze Estimation with an Efficient Calibration Module

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)