Table of Contents
Fetching ...

Modeling Rational Adaptation of Visual Search to Hierarchical Structures

Saku Sourulahti, Christian P Janssen, Jussi PP Jokinen

TL;DR

The paper tackles how visual search efficiency can be enhanced by exploiting hierarchical structure under human memory limits. It introduces a reinforcement-learning–driven, computationally rational model that learns to search within visual hierarchies via a POMDP framework, without hard-coded strategies. Empirical data from a human experiment show that structured layouts reduce search times, and the model’s predictions align closely with human performance (R^2 ≈ 0.90, RMSE ≈ 0.38 s), especially at larger set sizes. The work advances understanding of adaptive visual search in structured environments and suggests practical guidelines for designing visually organized information spaces in HCI contexts. It highlights the potential of hierarchical memory-based strategies to inform UI layout optimization and eye-movement prediction, while outlining limitations and paths for extending memory structure and perceptual grouping in future work.

Abstract

Efficient attention deployment in visual search is limited by human visual memory, yet this limitation can be offset by exploiting the environment's structure. This paper introduces a computational cognitive model that simulates how the human visual system uses visual hierarchies to prevent refixations in sequential attention deployment. The model adopts computational rationality, positing behaviors as adaptations to cognitive constraints and environmental structures. In contrast to earlier models that predict search performance for hierarchical information, our model does not include predefined assumptions about particular search strategies. Instead, our model's search strategy emerges as a result of adapting to the environment through reinforcement learning algorithms. In an experiment with human participants we test the model's prediction that structured environments reduce visual search times compared to random tasks. Our model's predictions correspond well with human search performance across various set sizes for both structured and unstructured visual layouts. Our work improves understanding of the adaptive nature of visual search in hierarchically structured environments and informs the design of optimized search spaces.

Modeling Rational Adaptation of Visual Search to Hierarchical Structures

TL;DR

The paper tackles how visual search efficiency can be enhanced by exploiting hierarchical structure under human memory limits. It introduces a reinforcement-learning–driven, computationally rational model that learns to search within visual hierarchies via a POMDP framework, without hard-coded strategies. Empirical data from a human experiment show that structured layouts reduce search times, and the model’s predictions align closely with human performance (R^2 ≈ 0.90, RMSE ≈ 0.38 s), especially at larger set sizes. The work advances understanding of adaptive visual search in structured environments and suggests practical guidelines for designing visually organized information spaces in HCI contexts. It highlights the potential of hierarchical memory-based strategies to inform UI layout optimization and eye-movement prediction, while outlining limitations and paths for extending memory structure and perceptual grouping in future work.

Abstract

Efficient attention deployment in visual search is limited by human visual memory, yet this limitation can be offset by exploiting the environment's structure. This paper introduces a computational cognitive model that simulates how the human visual system uses visual hierarchies to prevent refixations in sequential attention deployment. The model adopts computational rationality, positing behaviors as adaptations to cognitive constraints and environmental structures. In contrast to earlier models that predict search performance for hierarchical information, our model does not include predefined assumptions about particular search strategies. Instead, our model's search strategy emerges as a result of adapting to the environment through reinforcement learning algorithms. In an experiment with human participants we test the model's prediction that structured environments reduce visual search times compared to random tasks. Our model's predictions correspond well with human search performance across various set sizes for both structured and unstructured visual layouts. Our work improves understanding of the adaptive nature of visual search in hierarchically structured environments and informs the design of optimized search spaces.
Paper Structure (22 sections, 1 equation, 7 figures, 1 table)

This paper contains 22 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: The model predicts the how eye movement trajectories adapt to visual task structure. Black rectangles are elements of the layout, and blue lines represent the eye movement search path from fixation to fixation in numerical order. Left (a): search through randomly arranged visual elements. Right (b): Search through visually structured elements.
  • Figure 2: The model's architecture, where the agent makes decisions based on the state of the external environment, but through the internal environment's state. This internal state changes via the perception of stimuli into an internal representation relative to cognitive capacity and constraints. The agent's new action shifts the fixation to the location of a new element, and the duration taken for the saccade is calculated using the EMMA model, which results in a negative reward for the agent. A new element, in response, causes a new state change in the external environment.
  • Figure 3: The series of figures illustrates the change in the model's internal state as the simulated eye movement proceeds from element to element, aiming to encode one spatial group at a time. Eye-movement figures demonstrates how our model's encoding occurs when it is optimized to use hierarchical VSTM optimally. The model’s internal state also includes static information about visual grouping and the spatial locations. The grouping of elements is bounded by a blue dashed lines and group numbers are presented in the first stage (a) of the series of images.
  • Figure 4: Logarithmically transformed distribution of the experiment results on the left (a). Logarithmically transformed distribution of residuals from the linear mixed model shown on the right (b).
  • Figure 5: Estimated marginal means search times for each conditions.
  • ...and 2 more figures