Table of Contents
Fetching ...

DiffEyeSyn: Diffusion-based User-specific Eye Movement Synthesis

Chuhan Jiao, Guanhua Zhang, Yeonjoo Cho, Zhiming Hu, Andreas Bulling

TL;DR

DiffEyeSyn introduces a diffusion-based method to synthesize high-frequency, user-specific eye movements by injecting user-specific noise into any given sequence conditioned on embeddings from a pretrained user authenticator. It adds a novel user identity guidance loss to preserve identity in the velocity domain while producing human-like gaze dynamics, and adapts DiffWave for efficient high-frequency sequence generation. Experiments on GazeBase and JuDo1000 show that the method preserves user-specific characteristics, yields realistic velocity distributions, and can augment gaze datasets to boost downstream tasks such as gaze-based user identification. This work lays the groundwork for personalised gaze synthesis with broad applications in character animation, eye movement biometrics, and gaze data imputation.

Abstract

High-frequency gaze data contains more user-specific information than low-frequency data, promising for various applications. However, existing gaze modelling methods focus on low-frequency data, ignoring user-specific subtle eye movements in high-frequency eye movements. We present DiffEyeSyn -- the first computational method to synthesise eye movements specific to individual users. The key idea is to consider the user-specific information as a special type of noise in eye movement data. This perspective reshapes eye movement synthesis into the task of injecting this user-specific noise into any given eye movement sequence. We formulate this injection task as a conditional diffusion process in which the synthesis is conditioned on user-specific embeddings extracted from the gaze data using pre-trained models for user authentication. We propose user identity guidance -- a novel loss function that allows our model to preserve user identity while generating human-like eye movements in the spatial domain. Experiments on two public datasets show that our synthetic eye movements preserve user-specific characteristics and are more realistic than baseline approaches. Furthermore, we demonstrate that DiffEyeSyn can synthesise large-scale gaze data and support various downstream tasks, such as gaze-based user identification. As such, our work lays the methodological foundations for personalised eye movement synthesis that has significant application potential, such as for character animation, eye movement biometrics, and gaze data imputation.

DiffEyeSyn: Diffusion-based User-specific Eye Movement Synthesis

TL;DR

DiffEyeSyn introduces a diffusion-based method to synthesize high-frequency, user-specific eye movements by injecting user-specific noise into any given sequence conditioned on embeddings from a pretrained user authenticator. It adds a novel user identity guidance loss to preserve identity in the velocity domain while producing human-like gaze dynamics, and adapts DiffWave for efficient high-frequency sequence generation. Experiments on GazeBase and JuDo1000 show that the method preserves user-specific characteristics, yields realistic velocity distributions, and can augment gaze datasets to boost downstream tasks such as gaze-based user identification. This work lays the groundwork for personalised gaze synthesis with broad applications in character animation, eye movement biometrics, and gaze data imputation.

Abstract

High-frequency gaze data contains more user-specific information than low-frequency data, promising for various applications. However, existing gaze modelling methods focus on low-frequency data, ignoring user-specific subtle eye movements in high-frequency eye movements. We present DiffEyeSyn -- the first computational method to synthesise eye movements specific to individual users. The key idea is to consider the user-specific information as a special type of noise in eye movement data. This perspective reshapes eye movement synthesis into the task of injecting this user-specific noise into any given eye movement sequence. We formulate this injection task as a conditional diffusion process in which the synthesis is conditioned on user-specific embeddings extracted from the gaze data using pre-trained models for user authentication. We propose user identity guidance -- a novel loss function that allows our model to preserve user identity while generating human-like eye movements in the spatial domain. Experiments on two public datasets show that our synthetic eye movements preserve user-specific characteristics and are more realistic than baseline approaches. Furthermore, we demonstrate that DiffEyeSyn can synthesise large-scale gaze data and support various downstream tasks, such as gaze-based user identification. As such, our work lays the methodological foundations for personalised eye movement synthesis that has significant application potential, such as for character animation, eye movement biometrics, and gaze data imputation.
Paper Structure (26 sections, 14 equations, 5 figures, 4 tables, 2 algorithms)

This paper contains 26 sections, 14 equations, 5 figures, 4 tables, 2 algorithms.

Figures (5)

  • Figure 1: Pipeline of training DiffEyeSyn. DiffEyeSyn is trained in a self-supervised way. The original eye movement data $g_0$ and its identity removed variant $g_0^{co}$ are converted into velocities $x_0$ and $x_0^{co}$. The goal is to train the diffusion model to inject the removed identity information back into the $x_0^{co}$. We use a pretrained user embedder to extract the user-specific embedding from the $x_0$. At each diffusion timestep $t$, given the $x_0^{co}$ and the user embedding as the condition, DiffEyeSyn predicts the noise $\hat{\epsilon_t}$ that converts the $x_0$ to $x_t$. Instead of only optimising DiffEyeSyn with the normal diffusion loss which minimises the difference between the predicted and ground truth noise, we propose user identity guidance $\mathcal{L}_{id}$ - a novel loss function to constraint the synthesised data contains the given user-specific information. More specifically, we estimate the cleaned eye movement velocity $\hat{x_0}$ by denoising $x_t$ with the predicted noise $\hat{\epsilon_t}$. The proposed user identity guidance maximises the cosine similarity between the embedding of the $\hat{x_0}$ and $x_0$.
  • Figure 2: Architecture of DiffEyeSyn. It contains 30 residual layers with bidirectional dilated convolutions (Bi-DilConv) to ensure the memory- and time-efficient training and inference for generating very high-frequency eye movements.
  • Figure 3: Qualitative comparison between the identity removed eye movements, High-pass filter synthetic eye movements, DiffEyeSyn synthesised eye movements, and ground truth human eye movements in four tasks (HSS, RAN, TEX, FXS) within a 5-second time window. For each sequence of eye movements, we visualise its velocities (above) and gaze direction (below). A figure is empty means the method fails to produce valid results.
  • Figure 4: Four examples of the user identity manipulation task. Left: the eye movements used to extract the target user embedding. Middle 1: the eye movements from different users that DiffEyeSyn injects the target user information. Middle 2: High-pass filter synthesised eye movements. Right: DiffEyeSyn synthesised eye movements. For each sequence of eye movements, we visualise its velocities (above) and gaze direction (below). A figure is empty means the method fails to produce valid results.
  • Figure 5: Four examples of the user identity manipulation task with the same base eye movement sequence. Left: the eye movements used to extract the target user embedding. Middle: the eye movements that DiffEyeSyn injects the target user information. Right: DiffEyeSyn synthesised eye movements. For each sequence of eye movements, we visualise its velocities (above) and gaze direction (below).