Table of Contents
Fetching ...

Finding Shared Decodable Concepts and their Negations in the Brain

Cory Efird, Alex Murphy, Joel Zylberberg, Alona Fyshe

TL;DR

The contrastive-learning methodology better characterizes new and existing visuo-semantic representations in the brain by leveraging multimodal neural network representations and a novel adaptation of clustering algorithms.

Abstract

Prior work has offered evidence for functional localization in the brain; different anatomical regions preferentially activate for certain types of visual input. For example, the fusiform face area preferentially activates for visual stimuli that include a face. However, the spectrum of visual semantics is extensive, and only a few semantically-tuned patches of cortex have so far been identified in the human brain. Using a multimodal (natural language and image) neural network architecture (CLIP) we train a highly accurate contrastive model that maps brain responses during naturalistic image viewing to CLIP embeddings. We then use a novel adaptation of the DBSCAN clustering algorithm to cluster the parameters of these participant-specific contrastive models. This reveals what we call Shared Decodable Concepts (SDCs): clusters in CLIP space that are decodable from common sets of voxels across multiple participants. Examining the images most and least associated with each SDC cluster gives us additional insight into the semantic properties of each SDC. We note SDCs for previously reported visual features (e.g. orientation tuning in early visual cortex) as well as visual semantic concepts such as faces, places and bodies. In cases where our method finds multiple clusters for a visuo-semantic concept, the least associated images allow us to dissociate between confounding factors. For example, we discovered two clusters of food images, one driven by color, the other by shape. We also uncover previously unreported areas such as regions of extrastriate body area (EBA) tuned for legs/hands and sensitivity to numerosity in right intraparietal sulcus, and more. Thus, our contrastive-learning methodology better characterizes new and existing visuo-semantic representations in the brain by leveraging multimodal neural network representations and a novel adaptation of clustering algorithms.

Finding Shared Decodable Concepts and their Negations in the Brain

TL;DR

The contrastive-learning methodology better characterizes new and existing visuo-semantic representations in the brain by leveraging multimodal neural network representations and a novel adaptation of clustering algorithms.

Abstract

Prior work has offered evidence for functional localization in the brain; different anatomical regions preferentially activate for certain types of visual input. For example, the fusiform face area preferentially activates for visual stimuli that include a face. However, the spectrum of visual semantics is extensive, and only a few semantically-tuned patches of cortex have so far been identified in the human brain. Using a multimodal (natural language and image) neural network architecture (CLIP) we train a highly accurate contrastive model that maps brain responses during naturalistic image viewing to CLIP embeddings. We then use a novel adaptation of the DBSCAN clustering algorithm to cluster the parameters of these participant-specific contrastive models. This reveals what we call Shared Decodable Concepts (SDCs): clusters in CLIP space that are decodable from common sets of voxels across multiple participants. Examining the images most and least associated with each SDC cluster gives us additional insight into the semantic properties of each SDC. We note SDCs for previously reported visual features (e.g. orientation tuning in early visual cortex) as well as visual semantic concepts such as faces, places and bodies. In cases where our method finds multiple clusters for a visuo-semantic concept, the least associated images allow us to dissociate between confounding factors. For example, we discovered two clusters of food images, one driven by color, the other by shape. We also uncover previously unreported areas such as regions of extrastriate body area (EBA) tuned for legs/hands and sensitivity to numerosity in right intraparietal sulcus, and more. Thus, our contrastive-learning methodology better characterizes new and existing visuo-semantic representations in the brain by leveraging multimodal neural network representations and a novel adaptation of clustering algorithms.
Paper Structure (36 sections, 2 equations, 22 figures)

This paper contains 36 sections, 2 equations, 22 figures.

Figures (22)

  • Figure 1: Deriving SDC clusters. (a): The participant-specific linear decoders derived in Figure \ref{['sup:decoding_procedure']}. (b): Our modified DBSCAN clustering procedure is applied to the linear decoders (See Figure \ref{['fig:dbscan']} for details). (c): Our DBSCAN procedure derives binary masks over the voxels in the linear decoders for a specified number of clusters (of which one is highlighted in orange) (d): The rows corresponding to the selected voxels in the binary masks are extracted from the linear decoder matrices. (e): The 512-dimensional representations from the previous step are averaged over voxels and participants to derive a cluster centroid for each cluster derived from DBSCAN. We visualize the cluster centroid for the first DBSCAN cluster. (f): The linear decoders from (a) are applied to held-out fMRI data per-participant. Each participant saw images multiple times, so the matrix of predicted CLIP embeddings $\hat{Y}^{k}_{Test}$ is averaged over these repetitions and all linear decoding matrices are stacked (across participants) to give $\hat{Y}^{AVG}_{Test}$. (g): Cosine distance is calculated between the cluster centroids (e) and the brain-derived CLIP embeddings (f). (h): The images most associated with the cluster centroids (positive images) and most negatively associated with the cluster centroids (negative images) are identified. Positive / negative images for the SDC cluster pictured here appears to correspond to global vertical/horizontal orientation in the associated images. (i): Color-coded participant-specific voxel clusters are displayed on a flatmap of the brain's cortical surface in common fsaverage space (overlapping areas are displayed in white). Regions of interest labels are highlighted on the flatmap image in white outlines. For the specified cluster (e), whose positive / negative images are associated with orientation, the flatmap indicates bilateral shared voxel clusters in early visual cortex.
  • Figure 2: Illustration of our DBSCAN variant applied to multi-participant fMRI data. There are 3 participants in this example and $\texttt{minNieghbors}=2$. (a): A zoomed-in view of a cluster with three core points. The outer ring around each point shows its $\varepsilon$-neighborhood and whether it is a core, border, or outlier point. Black arrows emphasize points that are neighbors. The points with neighboring points from at least 2 other distinct participants are marked as core points. Non-core points that neighbor core points are added to the cluster as border points. The remaining points are marked as outliers. (b): A zoomed-out sketch of a set of points that form 3 clusters. Since $\texttt{minNieghbors}=2$, a high-density region will form clusters if and only if it contains points from at least 3 participants.
  • Figure 3: Cluster 7 ($\varepsilon = 0.55$). Positive images are strongly associated with faces, while the negative images represent depictions of people whose faces are not visible. Voxel clusters are primarily found in bilateral FFA and EBA.
  • Figure 4: Cluster 0 ($\varepsilon = 0.55$). Positive images are associated with food and color. Negative images are entirely grayscale. Voxel clusters span bilateral FFA, V4, and PPA.
  • Figure 5: Cluster 2 ($\varepsilon = 0.55$). Positive images are strongly associated with presence of legs, while negative images are typically people at tables whose legs are obscured. Voxel clusters are primarily in bilateral EBA.
  • ...and 17 more figures