Which Experimental Design is Better Suited for VQA Tasks? Eye Tracking Study on Cognitive Load, Performance, and Gaze Allocations

Sita A. Vriend; Sandeep Vidyapu; Amer Rama; Kun-Ting Chen; Daniel Weiskopf

Which Experimental Design is Better Suited for VQA Tasks? Eye Tracking Study on Cognitive Load, Performance, and Gaze Allocations

Sita A. Vriend, Sandeep Vidyapu, Amer Rama, Kun-Ting Chen, Daniel Weiskopf

TL;DR

This study addresses how the order of image stimuli and questions, together with question modality, shape cognitive load, accuracy, and gaze in visual question answering (VQA). It compares five designs using eye-tracking and subjective/cobjective measures (NASA-TLX, accuracy, HAAR, fixation duration) to identify designs that minimize extraneous cognitive burden. Key findings show that the IQ design is most taxing and least accurate, while designs like QI, IQI, and QIQ offer better performance with varied gaze patterns; the auditory AIA design may hinder comprehension. The results provide practical guidance for designing robust visualization experiments and gaze-based studies that rely on VQA tasks.

Abstract

We conducted an eye-tracking user study with 13 participants to investigate the influence of stimulus-question ordering and question modality on participants using visual question-answering (VQA) tasks. We examined cognitive load, task performance, and gaze allocations across five distinct experimental designs, aiming to identify setups that minimize the cognitive burden on participants. The collected performance and gaze data were analyzed using quantitative and qualitative methods. Our results indicate a significant impact of stimulus-question ordering on cognitive load and task performance, as well as a noteworthy effect of question modality on task performance. These findings offer insights for the experimental design of controlled user studies in visualization research.

Which Experimental Design is Better Suited for VQA Tasks? Eye Tracking Study on Cognitive Load, Performance, and Gaze Allocations

TL;DR

Abstract

Paper Structure (15 sections, 3 figures)

This paper contains 15 sections, 3 figures.

Introduction
Related Work
Research Questions
Experimental Setup and Design
Stimuli Preparation
Experimental Designs and Dependent Variables
Apparatus and Pilot Studies
Participants and Experimental Procedure
Results and Analysis
RQ1: Does the presentation order of image stimulus and question impact CL? Does the modality of the question affect CL?
RQ2: Does the presentation order of image and question impact accuracy? Does the modality of the question have an effect?
RQ3: Does the presentation order of image and question impact the gaze allocations? Does the modality of the question have an effect?
Scanpaths and Aggregated Fixation Distribution
Comparative Statistical Analysis Based on Gaze Metrics
Discussion and Conclusion

Figures (3)

Figure 1: Example of (a) image stimulus, (b) corresponding task (question), and (c) response selection. The correct answer here is "right" because the tower is on the right side; however, the participant selected "left."
Figure 2: Violin plots of cognitive load according to NASA-TLX rating (A), task accuracy (B), hit-any-AOI rate per experimental design (C), and mean fixation duration measured in milliseconds (D). The horizontal black line in each plot represents the mean. Significant differences according to post-hoc tests are marked with asterisks (* p < 0.05; ** p < 0.01; *** p < 0.001).
Figure 3: Visual scanpath overlaid on images of three study designs of a selected question, where the number and radius indicate the fixation sequence and its duration, respectively. The yellow and red dots indicate the beginning and the end of a scanpath. Aggregated attention is displayed in density maps (d, h, l), while scarf plots (e, i, m) show the fixation duration over a target AOI (colored in light blue) across all participants. The task correctness is shown as green ticks (for correct answers) and red crosses (for incorrect answers). Percentage shows the relative fixation duration spent on the target AOI. Scarf plots in each row are ordered by decreasing relative fixation duration of a target AOI.

Which Experimental Design is Better Suited for VQA Tasks? Eye Tracking Study on Cognitive Load, Performance, and Gaze Allocations

TL;DR

Abstract

Which Experimental Design is Better Suited for VQA Tasks? Eye Tracking Study on Cognitive Load, Performance, and Gaze Allocations

Authors

TL;DR

Abstract

Table of Contents

Figures (3)