Table of Contents
Fetching ...

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

Angela Lopez-Cardona, Carlos Segura, Alexandros Karatzoglou, Sergi Abadal, Ioannis Arapakis

TL;DR

This work tackles the challenge of aligning LLM outputs with human preferences by introducing GazeReward, which injects implicit eye-tracking feedback into the reward model. It generates ET features with two predictors, fuses them with text via GazeConcat or GazeAdd, and trains a regression-based RM to predict human preference signals. Across multiple models and datasets, the approach yields consistent RM accuracy gains, with RewardBench showing substantial relative improvements for Mistral-7B, demonstrating the potential of cognitive signals to enhance AI alignment at scale. The findings suggest that incorporating eye-tracking data can complement explicit feedback and enable cost-effective, scalable improvements in human-aligned NLP systems.

Abstract

Advancements in Natural Language Processing (NLP), have led to the emergence of Large Language Models (LLMs) such as GPT, Llama, Claude, and Gemini, which excel across a range of tasks but require extensive fine-tuning to align their outputs with human expectations. A widely used method for achieving this alignment is Reinforcement Learning from Human Feedback (RLHF), which, despite its success, faces challenges in accurately modelling human preferences. In this paper, we introduce GazeReward, a novel framework that integrates implicit feedback -- and specifically eye-tracking (ET) data -- into the Reward Model (RM). In addition, we explore how ET-based features can provide insights into user preferences. Through ablation studies we test our framework with different integration methods, LLMs, and ET generator models, demonstrating that our approach significantly improves the accuracy of the RM on established human preference datasets. This work advances the ongoing discussion on optimizing AI alignment with human values, exploring the potential of cognitive data for shaping future NLP research.

Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models

TL;DR

This work tackles the challenge of aligning LLM outputs with human preferences by introducing GazeReward, which injects implicit eye-tracking feedback into the reward model. It generates ET features with two predictors, fuses them with text via GazeConcat or GazeAdd, and trains a regression-based RM to predict human preference signals. Across multiple models and datasets, the approach yields consistent RM accuracy gains, with RewardBench showing substantial relative improvements for Mistral-7B, demonstrating the potential of cognitive signals to enhance AI alignment at scale. The findings suggest that incorporating eye-tracking data can complement explicit feedback and enable cost-effective, scalable improvements in human-aligned NLP systems.

Abstract

Advancements in Natural Language Processing (NLP), have led to the emergence of Large Language Models (LLMs) such as GPT, Llama, Claude, and Gemini, which excel across a range of tasks but require extensive fine-tuning to align their outputs with human expectations. A widely used method for achieving this alignment is Reinforcement Learning from Human Feedback (RLHF), which, despite its success, faces challenges in accurately modelling human preferences. In this paper, we introduce GazeReward, a novel framework that integrates implicit feedback -- and specifically eye-tracking (ET) data -- into the Reward Model (RM). In addition, we explore how ET-based features can provide insights into user preferences. Through ablation studies we test our framework with different integration methods, LLMs, and ET generator models, demonstrating that our approach significantly improves the accuracy of the RM on established human preference datasets. This work advances the ongoing discussion on optimizing AI alignment with human values, exploring the potential of cognitive data for shaping future NLP research.
Paper Structure (24 sections, 2 equations, 9 figures, 8 tables)

This paper contains 24 sections, 2 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: GazeReward Framework for using eye-tracking data for reward modelling. We use a generator model to compute et features on a preference dataset $D$ and we train the human preference by combining both text and et embeddings (See \ref{['sec:method']} for details.)
  • Figure 2: Overview of the GazeReward framework, incorporating eye-tracking features into the reward model. The architecture is illustrated in the figure using the second et prediction model, but it would be identical if the first one were used instead (see \ref{['sec:exp']})
  • Figure 4: Validation loss with different LR on ConcatReward, batch size 8, features: $f1$ and Meta-Llama-3-8B-Instruct base model
  • Figure 5: Validation loss with different batch size, learning rate: 5e-5, features: $f1$ and Meta-Llama-3-8B-Instruct base model
  • Figure 6: ConcatReward, LR: 0.00001, features: $f1$ and Meta-Llama-3-8B-Instruct base model
  • ...and 4 more figures