Table of Contents
Fetching ...

Neural network-based encoding in free-viewing fMRI with gaze-aware models

Dora Gozukara, Nasir Ahmad, Katja Seeliger, Djamari Oetringer, Linda Geerligs

Abstract

Representations learned by convolutional neural networks (CNNs) exhibit a remarkable resemblance to information processing patterns observed in the primate visual system on large neuroimaging datasets collected under diverse, naturalistic visual stimulation, but with instruction for participants to maintain central fixation. This viewing condition, however, diverges significantly from ecologically valid visual behaviour, suppresses activity in visually active regions, and imposes substantial cognitive load on the viewing task. We present a modification of the encoding model framework, adapting it for use with naturalistic vision datasets acquired under fully natural viewing conditions, without fixation, by incorporating eye-tracking data. Our gaze-aware encoding models were trained on the StudyForrest dataset, which features task-free naturalistic movie viewing. By combining eye-tracking data with the visual content of movie frames, we generate combined subject-wise gaze-stimulus specific feature time series. These time series are constructed by sampling only the locally and temporally relevant elements of the CNN feature map for each fixation. Our results demonstrate that gaze-aware encoding models match the performance of conventional encoding models with 112x fewer model parameters. Gaze-aware encoding models were especially beneficial for participants with more dynamic eye-movement patterns. Therefore, this approach opens the door to more ecologically valid models that can be built in more naturalistic settings, such as playing games or navigating virtual environments.

Neural network-based encoding in free-viewing fMRI with gaze-aware models

Abstract

Representations learned by convolutional neural networks (CNNs) exhibit a remarkable resemblance to information processing patterns observed in the primate visual system on large neuroimaging datasets collected under diverse, naturalistic visual stimulation, but with instruction for participants to maintain central fixation. This viewing condition, however, diverges significantly from ecologically valid visual behaviour, suppresses activity in visually active regions, and imposes substantial cognitive load on the viewing task. We present a modification of the encoding model framework, adapting it for use with naturalistic vision datasets acquired under fully natural viewing conditions, without fixation, by incorporating eye-tracking data. Our gaze-aware encoding models were trained on the StudyForrest dataset, which features task-free naturalistic movie viewing. By combining eye-tracking data with the visual content of movie frames, we generate combined subject-wise gaze-stimulus specific feature time series. These time series are constructed by sampling only the locally and temporally relevant elements of the CNN feature map for each fixation. Our results demonstrate that gaze-aware encoding models match the performance of conventional encoding models with 112x fewer model parameters. Gaze-aware encoding models were especially beneficial for participants with more dynamic eye-movement patterns. Therefore, this approach opens the door to more ecologically valid models that can be built in more naturalistic settings, such as playing games or navigating virtual environments.
Paper Structure (27 sections, 3 equations, 9 figures, 1 table)

This paper contains 27 sections, 3 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Schematic figure that outlines the most important aspects of gaze-aware encoding models. (a) Unique gaze patterns for each individual are collected. (b) For computational efficiency, and to be able to build one model across all layers, we combined CNN model features into a hyperlayer feature map. However, gaze-aware encoding models could also be built for separate layers. (c) This way, the gaze-aware encoding model in each voxel is based specifically on the CNN features that are relevant for information processed by each individual. (d) Traditional CNN based encoding models use the whole feature sets of layers. (e) Diagram which shows how our approach compares to other approaches in the literature in terms of the number of parameters in the model and the type of constraints that are used to reduce the number of parameters (synthetic or behavioural constraints).
  • Figure 1: Single subject gaze-aware model performances. Cortical maps show gaze-aware model performances for all participants. The best performing voxels vary considerably between individuals.
  • Figure 2: Gaze-aware models match baseline models. (a) Histograms showing group average model performances for all modelled voxels. (b) Cortical maps show group average model performances for voxels that are statistically significantly predicted by each model. Gaze-aware and baseline models predict the same range of voxels with very similar performances. Center fixation models predict fewer voxels. PCA baseline models reach statistical significance for only 3% of voxels, and thus are not shown. (c) Cortical map shows group average model performance differences between the gaze-aware model and baseline model. (d) Violin plots show model performances in bilateral V1, V2, V3, LOc, and FG. Each violin plot shows the distribution of average model performance within an ROI across all participants. Each horizontal line shows one participant. Gaze-aware models are not statistically significantly different than baseline models in any ROI.
  • Figure 2: Single subject baseline model performances. Cortical maps show baseline model performances for all participants. The best performing voxels vary considerably between individuals.
  • Figure 3: Baseline models learn from a spatial distribution that has some correspondence to the subject gaze, but only gaze-aware models improve with dynamic viewing. (a) Each row shows one subject. Left column shows each subjects gaze distribution heatmap. Middle and Right columns shows the spatial distribution of weights learned by the baseline model. Middle shows the distribution of the best performing voxel. Right shows the distribution of the worst performing voxel. (b) Cortical maps show the group-averaged correlations between gaze distribution and each voxels spatial weight distributions learned by the baseline model. (c) Top: Scatter plot shows the relationship between the model performances and similarity between subject gaze and learned spatial model weight distributions. Notice that the spatial weight distribution is a property of the baseline models. Therefore the x-axis for the gaze-aware models show the similarity between gaze distribution and the spatial weight distribution of the baseline model for the corresponding subject. Bottom: Scatter plot shows the relationship between the model performances and normalized number of fixations subjects made during free-viewing. (d) Shannon entropy of the spatial weight distributions learned by the baseline model, group-averaged and projected onto the cortex.
  • ...and 4 more figures