Sparse components distinguish visual pathways & their alignment to neural networks
Ammar I Marvi, Nancy G Kanwisher, Meenakshi Khosla
TL;DR
This work addresses why deep neural networks trained for object recognition align with multiple visual streams yet differ in neural tuning by applying Bayesian non-negative matrix factorization to decompose fMRI responses into sparse, interpretable components across ventral, dorsal, and lateral streams. It introduces Sparse Component Alignment (SCA), a tuning-preserving representational similarity measure that uses these sparse components to compare brain representations with DNN activations. The analysis shows strong ventral alignment with image-trained networks, while dorsal and lateral streams exhibit markedly weaker alignment, revealing that traditional rotation-invariant metrics mask important axis-specific tuning. The findings provide a pathway for building more ecologically valid models of human vision, suggesting modality- or task-specific network designs (e.g., video-trained or social-cognition-inspired models) to better capture dorsal and lateral processes. Overall, the study highlights the value of sparse, component-based analyses for understanding brain–neural network correspondences and guiding the development of more targeted computational theories of vision.
Abstract
The ventral, dorsal, and lateral streams in high-level human visual cortex are implicated in distinct functional processes. Yet, deep neural networks (DNNs) trained on a single task model the entire visual system surprisingly well, hinting at common computational principles across these pathways. To explore this inconsistency, we applied a novel sparse decomposition approach to identify the dominant components of visual representations within each stream. Consistent with traditional neuroscience research, we find a clear difference in component response profiles across the three visual streams -- identifying components selective for faces, places, bodies, text, and food in the ventral stream; social interactions, implied motion, and hand actions in the lateral stream; and some less interpretable components in the dorsal stream. Building on this, we introduce Sparse Component Alignment (SCA), a new method for measuring representational alignment between brains and machines that better captures the latent neural tuning of these two visual systems. Using SCA, we find that standard visual DNNs are more aligned with the ventral than either dorsal or lateral representations. SCA reveals these distinctions with greater resolution than conventional population-level geometry, offering a measure of representational alignment that is sensitive to a system's underlying axes of neural tuning.
