Table of Contents
Fetching ...

Sparse components distinguish visual pathways & their alignment to neural networks

Ammar I Marvi, Nancy G Kanwisher, Meenakshi Khosla

TL;DR

This work addresses why deep neural networks trained for object recognition align with multiple visual streams yet differ in neural tuning by applying Bayesian non-negative matrix factorization to decompose fMRI responses into sparse, interpretable components across ventral, dorsal, and lateral streams. It introduces Sparse Component Alignment (SCA), a tuning-preserving representational similarity measure that uses these sparse components to compare brain representations with DNN activations. The analysis shows strong ventral alignment with image-trained networks, while dorsal and lateral streams exhibit markedly weaker alignment, revealing that traditional rotation-invariant metrics mask important axis-specific tuning. The findings provide a pathway for building more ecologically valid models of human vision, suggesting modality- or task-specific network designs (e.g., video-trained or social-cognition-inspired models) to better capture dorsal and lateral processes. Overall, the study highlights the value of sparse, component-based analyses for understanding brain–neural network correspondences and guiding the development of more targeted computational theories of vision.

Abstract

The ventral, dorsal, and lateral streams in high-level human visual cortex are implicated in distinct functional processes. Yet, deep neural networks (DNNs) trained on a single task model the entire visual system surprisingly well, hinting at common computational principles across these pathways. To explore this inconsistency, we applied a novel sparse decomposition approach to identify the dominant components of visual representations within each stream. Consistent with traditional neuroscience research, we find a clear difference in component response profiles across the three visual streams -- identifying components selective for faces, places, bodies, text, and food in the ventral stream; social interactions, implied motion, and hand actions in the lateral stream; and some less interpretable components in the dorsal stream. Building on this, we introduce Sparse Component Alignment (SCA), a new method for measuring representational alignment between brains and machines that better captures the latent neural tuning of these two visual systems. Using SCA, we find that standard visual DNNs are more aligned with the ventral than either dorsal or lateral representations. SCA reveals these distinctions with greater resolution than conventional population-level geometry, offering a measure of representational alignment that is sensitive to a system's underlying axes of neural tuning.

Sparse components distinguish visual pathways & their alignment to neural networks

TL;DR

This work addresses why deep neural networks trained for object recognition align with multiple visual streams yet differ in neural tuning by applying Bayesian non-negative matrix factorization to decompose fMRI responses into sparse, interpretable components across ventral, dorsal, and lateral streams. It introduces Sparse Component Alignment (SCA), a tuning-preserving representational similarity measure that uses these sparse components to compare brain representations with DNN activations. The analysis shows strong ventral alignment with image-trained networks, while dorsal and lateral streams exhibit markedly weaker alignment, revealing that traditional rotation-invariant metrics mask important axis-specific tuning. The findings provide a pathway for building more ecologically valid models of human vision, suggesting modality- or task-specific network designs (e.g., video-trained or social-cognition-inspired models) to better capture dorsal and lateral processes. Overall, the study highlights the value of sparse, component-based analyses for understanding brain–neural network correspondences and guiding the development of more targeted computational theories of vision.

Abstract

The ventral, dorsal, and lateral streams in high-level human visual cortex are implicated in distinct functional processes. Yet, deep neural networks (DNNs) trained on a single task model the entire visual system surprisingly well, hinting at common computational principles across these pathways. To explore this inconsistency, we applied a novel sparse decomposition approach to identify the dominant components of visual representations within each stream. Consistent with traditional neuroscience research, we find a clear difference in component response profiles across the three visual streams -- identifying components selective for faces, places, bodies, text, and food in the ventral stream; social interactions, implied motion, and hand actions in the lateral stream; and some less interpretable components in the dorsal stream. Building on this, we introduce Sparse Component Alignment (SCA), a new method for measuring representational alignment between brains and machines that better captures the latent neural tuning of these two visual systems. Using SCA, we find that standard visual DNNs are more aligned with the ventral than either dorsal or lateral representations. SCA reveals these distinctions with greater resolution than conventional population-level geometry, offering a measure of representational alignment that is sensitive to a system's underlying axes of neural tuning.

Paper Structure

This paper contains 22 sections, 14 equations, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: Schematic overview of the data-driven component modeling approach. (a) We used Bayesian non-negative matrix factorization (NMF) to decompose a given voxel X stimuli matrix into two lower rank matrices representing component responses ${\bm{R}}$ and the corresponding weights of anatomical voxels ${\bm{W}}$. (b) For each iteration, connectivity matrices ${\bm{C}}$ are created using rank-ordered component responses, where each cell of the connectivity matrix $c_{i,j}$ represents whether a pair of stimuli $i,j$ maximally load onto the same component. Binary matrices are averaged across all iterations to produce a single image connectivity matrix.
  • Figure 2: Simulations of latent component recovery and rotation sensitivity Different methods used to recover the latent components of simulated data $\mathbf{X}$. (a) A sparse decomposition finds the optimal mapping of original-to-inferred components (top, red-outlined matrix entries). Unlike sparse NMF (snmf), Bayesian NMF (bnmf) jointly infers sparsity in ${\bm{W}}$ and ${\bm{R}}$ (bottom, gray bars). (b) NMF but not PCA components are dissimilar after correcting for rotation, measured via Pearson's r. (note: overlapping PCA components) (c) Sparse component alignment (SCA) demonstrates a clear sensitivity to minor perturbations in the native axes of the representation; specifically, increasing the extent of axis rotations ($\mathbf{X}\rightarrow\mathbf{X}_r)$—whether through larger angles or a greater number of 2D planes rotated—results in more substantial decreases in alignment.
  • Figure 3: Examination of data decomposition. (a) Explained variance of an example neural response matrix ${\bm{D}}$ in brain and models. (b) Bayesian priors produce sparse components in non-negative matrix factorization (NMF). Measured sparsity of example weight ${\bm{W}}$ (top) response ${\bm{R}}$ (bottom) matrices of components derivered from NMF, bayesian NMF, and PCA in the brain and models. Note: bars for standard NMF are present but close to zero.
  • Figure 4: Component response profiles and preferred stimuli. Plots depicting the response profiles of the most consistent components across the four subjects in the (a) ventral (blue), (b) lateral (red), and (c) dorsal (green) streams. Each subplot shows the same $1,000$ images (depicted as sticks) rank-ordered by their evoked component response (y-axis, a.u) colored by their average saliency rating to a component-specific prompt. The correlation between saliency ratings and component responses are provided in each subplot, along with three of the component's preferred stimuli. (d) Visualization of the anatomical masks used to demarcate each visual stream.
  • Figure 5: Alignment of deep neural networks (DNNs) to the brain. The measured alignment between visual representations in the brain---in dorsal, lateral, and ventral streams---and the same set of 7 visual DNNs. The untrained model is in white, and pre-trained models are colored in various shades of grey. (a) From left to right, similarity is measured by linear encoding, representational similarity analysis (RSA), sparse component alignment (SCA), and the 1-1 component matching score (CMS). (b) Alignment between each pathway and intermediate layers of a pre-trained AlexNet model, using SCA.
  • ...and 5 more figures