Table of Contents
Fetching ...

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream

Abdulkadir Gokce, Martin Schrimpf

TL;DR

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream investigates how model size, dataset size, and compute influence alignment to the primate ventral visual stream. Using a controlled, from-scratch protocol across more than 600 models and Brain-Score benchmarks on ImageNet and EcoSet, the authors fit parametric power-law curves to misalignment and compute optimal resource allocations. The key findings show that behavioral alignment scales with data and compute while neural alignment saturates, with higher visual areas benefiting most from scaling; data scaling outperforms model scaling in improving brain alignment. The study concludes that scaling alone is insufficient to model the brain's ventral stream and highlights the need for novel architectures, training strategies, and brain-data integration, with open-source code and model checkpoints released for replication and future work.

Abstract

When trained on large-scale object classification datasets, certain artificial neural network models begin to approximate core object recognition behaviors and neural response patterns in the primate brain. While recent machine learning advances suggest that scaling compute, model size, and dataset size improves task performance, the impact of scaling on brain alignment remains unclear. In this study, we explore scaling laws for modeling the primate visual ventral stream by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and behavior. We find that while behavioral alignment continues to scale with larger models, neural alignment saturates. This observation remains true across model architectures and training datasets, even though models with stronger inductive biases and datasets with higher-quality images are more compute-efficient. Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment. Our results suggest that while scaling current architectures and datasets might suffice for alignment with human core object recognition behavior, it will not yield improved models of the brain's visual ventral stream, highlighting the need for novel strategies in building brain models.

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream

TL;DR

Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream investigates how model size, dataset size, and compute influence alignment to the primate ventral visual stream. Using a controlled, from-scratch protocol across more than 600 models and Brain-Score benchmarks on ImageNet and EcoSet, the authors fit parametric power-law curves to misalignment and compute optimal resource allocations. The key findings show that behavioral alignment scales with data and compute while neural alignment saturates, with higher visual areas benefiting most from scaling; data scaling outperforms model scaling in improving brain alignment. The study concludes that scaling alone is insufficient to model the brain's ventral stream and highlights the need for novel architectures, training strategies, and brain-data integration, with open-source code and model checkpoints released for replication and future work.

Abstract

When trained on large-scale object classification datasets, certain artificial neural network models begin to approximate core object recognition behaviors and neural response patterns in the primate brain. While recent machine learning advances suggest that scaling compute, model size, and dataset size improves task performance, the impact of scaling on brain alignment remains unclear. In this study, we explore scaling laws for modeling the primate visual ventral stream by systematically evaluating over 600 models trained under controlled conditions on benchmarks spanning V1, V2, V4, IT and behavior. We find that while behavioral alignment continues to scale with larger models, neural alignment saturates. This observation remains true across model architectures and training datasets, even though models with stronger inductive biases and datasets with higher-quality images are more compute-efficient. Increased scaling is especially beneficial for higher-level visual areas, where small models trained on few samples exhibit only poor alignment. Our results suggest that while scaling current architectures and datasets might suffice for alignment with human core object recognition behavior, it will not yield improved models of the brain's visual ventral stream, highlighting the need for novel strategies in building brain models.

Paper Structure

This paper contains 33 sections, 9 equations, 14 figures.

Figures (14)

  • Figure 1: a) For a given compute budget ($C$), we determine the scaling laws for maximal neural and behavioral alignment to the primate visual ventral stream. b) We find consistent scaling laws for brain and behavioral alignment across over 600 models. While we predict models to approach perfect behavioral alignment at large scales, the effect of scaling on brain alignment is already saturating.
  • Figure 2: Scaling Model Size.a) Neural and behavioral alignments of four architecture families. Models with inductive biases (ResNet, EfficientNet) are more compute-efficient than less constrained models (ConvNeXt, ViT). b) Average alignment per model architecture. All models converge to similar alignments. c) Increasing parameters improves alignment (models trained on full datasets), but the effects saturate.
  • Figure 3: Scaling Dataset Size.a) Training on larger datasets enhances brain alignment. The alignment scaling curves derived from ImageNet and EcoSet closely estimate the alignment achieved when using ImageNet21k. In contrast, datasets with specialized image distributions—such as Places365—fall below the alignment scaling laws established by these generalist datasets. In the extreme case of handwritten digits (infiMNIST), the impact of training is minuscule on the alignment. b) Model families with weaker inductive bias start at a lower alignment and require more data to improve.
  • Figure 4: Optimal Compute Allocation.a) Alignment as a function of both model and training dataset sizes. Marker size is log-proportional to model size. Compute should be spent 0.3/0.7 on model/dataset size respectively. b) Models start out at different alignments but converge to the same saturating point.
  • Figure 5: Graded Effect of Scale across Cortical Hierarchy.a) Alignment as a function of training compute across different brain regions. Group 1 contains most models except those with low inductive bias (Group 2; ConvNeXt, ViT). b) Alignment gain per region, defined as $A {10}^\alpha$. Regions higher in the cortical hierarchy show greater benefits from increased compute (Behavior $>$ IT $>$ V4 $>$ V2 $>$ V1).
  • ...and 9 more figures