Table of Contents
Fetching ...

Multi-task Gaze Estimation Via Unidirectional Convolution

Zhang Cheng, Yanxia Wang

TL;DR

The paper tackles the challenge that lightweight gaze-estimation models struggle to balance accuracy and efficiency due to limited feature channels. It introduces Multitask-Gaze, a compact architecture built around Unidirectional Convolution (UC), Spatial and Channel Attention (SCA), Global Convolution Module (GCM), and Multi-task Regression Module (MRM) to extend receptive fields and fuse inter-channel information without large parameter loads, achieving state-of-the-art efficiency. Empirically, it reports angular-error gains on MPIIFaceGaze and Gaze360 (1.71% and 2.75% improvements over SUGE) and substantial reductions in parameters and FLOPs (up to 75.5% and 86.88%), with additional improvements over MobileNetV3 baselines on multiple datasets. Ablation studies validate the individual contributions of UC, SCA, GCM, and MRM, and show that GCM and UC provide meaningful gains in global information integration and receptive-field expansion, yielding a practical, plug‑and‑play lightweight gaze estimator.

Abstract

Using lightweight models as backbone networks in gaze estimation tasks often results in significant performance degradation. The main reason is that the number of feature channels in lightweight networks is usually small, which makes the model expression ability limited. In order to improve the performance of lightweight models in gaze estimation tasks, a network model named Multitask-Gaze is proposed. The main components of Multitask-Gaze include Unidirectional Convolution (UC), Spatial and Channel Attention (SCA), Global Convolution Module (GCM), and Multi-task Regression Module(MRM). UC not only significantly reduces the number of parameters and FLOPs, but also extends the receptive field and improves the long-distance modeling capability of the model, thereby improving the model performance. SCA highlights gaze-related features and suppresses gaze-irrelevant features. The GCM replaces the pooling layer and avoids the performance degradation due to information loss. MRM improves the accuracy of individual tasks and strengthens the connections between tasks for overall performance improvement. The experimental results show that compared with the State-of-the-art method SUGE, the performance of Multitask-Gaze on MPIIFaceGaze and Gaze360 datasets is improved by 1.71% and 2.75%, respectively, while the number of parameters and FLOPs are significantly reduced by 75.5% and 86.88%.

Multi-task Gaze Estimation Via Unidirectional Convolution

TL;DR

The paper tackles the challenge that lightweight gaze-estimation models struggle to balance accuracy and efficiency due to limited feature channels. It introduces Multitask-Gaze, a compact architecture built around Unidirectional Convolution (UC), Spatial and Channel Attention (SCA), Global Convolution Module (GCM), and Multi-task Regression Module (MRM) to extend receptive fields and fuse inter-channel information without large parameter loads, achieving state-of-the-art efficiency. Empirically, it reports angular-error gains on MPIIFaceGaze and Gaze360 (1.71% and 2.75% improvements over SUGE) and substantial reductions in parameters and FLOPs (up to 75.5% and 86.88%), with additional improvements over MobileNetV3 baselines on multiple datasets. Ablation studies validate the individual contributions of UC, SCA, GCM, and MRM, and show that GCM and UC provide meaningful gains in global information integration and receptive-field expansion, yielding a practical, plug‑and‑play lightweight gaze estimator.

Abstract

Using lightweight models as backbone networks in gaze estimation tasks often results in significant performance degradation. The main reason is that the number of feature channels in lightweight networks is usually small, which makes the model expression ability limited. In order to improve the performance of lightweight models in gaze estimation tasks, a network model named Multitask-Gaze is proposed. The main components of Multitask-Gaze include Unidirectional Convolution (UC), Spatial and Channel Attention (SCA), Global Convolution Module (GCM), and Multi-task Regression Module(MRM). UC not only significantly reduces the number of parameters and FLOPs, but also extends the receptive field and improves the long-distance modeling capability of the model, thereby improving the model performance. SCA highlights gaze-related features and suppresses gaze-irrelevant features. The GCM replaces the pooling layer and avoids the performance degradation due to information loss. MRM improves the accuracy of individual tasks and strengthens the connections between tasks for overall performance improvement. The experimental results show that compared with the State-of-the-art method SUGE, the performance of Multitask-Gaze on MPIIFaceGaze and Gaze360 datasets is improved by 1.71% and 2.75%, respectively, while the number of parameters and FLOPs are significantly reduced by 75.5% and 86.88%.

Paper Structure

This paper contains 12 sections, 4 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 2: Structure of bneck
  • Figure 3: Structure of SCA
  • Figure 4: Structure of GCM
  • Figure 5: Structure of MRM
  • Figure 6: Visualization of receptive field