Table of Contents
Fetching ...

Intelligent support for Human Oversight: Integrating Reinforcement Learning with Gaze Simulation to Personalize Highlighting

Thorsten Klößner, João Belo, Zekun Wu, Jörg Hoffmann, Anna Maria Feit

TL;DR

The paper tackles how to support human oversight under time pressure by learning adaptive highlighting policies through reinforcement learning guided by models of user gaze. It formulates the monitoring interface and user attention as an MDP and trains a PPO agent in a simulated environment that balances alert benefits against cognitive costs via a gaze-driven state transition. By integrating a temporal saliency model (based on a fine-tuned TASED-Net) to predict gaze, the approach enables offline policy learning for a multi-drone oversight scenario ($N=4$, $|Attr|=8$) without real-world deployment. Preliminary qualitative results indicate that RL-based highlighting can outperform static rule-based highlighting, but substantial challenges remain in gaze-model fidelity, reward design, and empirical validation with real users.

Abstract

Interfaces for human oversight must effectively support users' situation awareness under time-critical conditions. We explore reinforcement learning (RL)-based UI adaptation to personalize alerting strategies that balance the benefits of highlighting critical events against the cognitive costs of interruptions. To enable learning without real-world deployment, we integrate models of users' gaze behavior to simulate attentional dynamics during monitoring. Using a delivery-drone oversight scenario, we present initial results suggesting that RL-based highlighting can outperform static, rule-based approaches and discuss challenges of intelligent oversight support.

Intelligent support for Human Oversight: Integrating Reinforcement Learning with Gaze Simulation to Personalize Highlighting

TL;DR

The paper tackles how to support human oversight under time pressure by learning adaptive highlighting policies through reinforcement learning guided by models of user gaze. It formulates the monitoring interface and user attention as an MDP and trains a PPO agent in a simulated environment that balances alert benefits against cognitive costs via a gaze-driven state transition. By integrating a temporal saliency model (based on a fine-tuned TASED-Net) to predict gaze, the approach enables offline policy learning for a multi-drone oversight scenario (, ) without real-world deployment. Preliminary qualitative results indicate that RL-based highlighting can outperform static rule-based highlighting, but substantial challenges remain in gaze-model fidelity, reward design, and empirical validation with real users.

Abstract

Interfaces for human oversight must effectively support users' situation awareness under time-critical conditions. We explore reinforcement learning (RL)-based UI adaptation to personalize alerting strategies that balance the benefits of highlighting critical events against the cognitive costs of interruptions. To enable learning without real-world deployment, we integrate models of users' gaze behavior to simulate attentional dynamics during monitoring. Using a delivery-drone oversight scenario, we present initial results suggesting that RL-based highlighting can outperform static, rule-based approaches and discuss challenges of intelligent oversight support.
Paper Structure (10 sections, 2 equations, 3 figures)

This paper contains 10 sections, 2 equations, 3 figures.

Figures (3)

  • Figure 1: Example behavior of the learned interface policy as a critical situation affecting drone 3 occurs in the scenario. Only the region of the interface dedicated to drone 3 is shown. Each icon is annotated with its actual displayed value below, as well as the value of the user knowledge state in parentheses further below.
  • Figure 2: Dashboard-like interface for supervising four drones. Each drone panel displays eight attribute icons, while a map view provides spatial context. The interface is adopted from our previous work wu2025understanding. The UI agent learns to control the highlighting of icons. The map on the right is only for increasing the realism of the interface during user studies and is disregarded by the UI agent.
  • Figure 3: Training curve of PPO with environment parameter $H = 500$ for the highlight penalty. On the x-axis, the number of training samples is shown. On the y-axis, the mean total reward per collected episode is shown.