Table of Contents
Fetching ...

Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex

Spandan Madan, Will Xiao, Mingran Cao, Hanspeter Pfister, Margaret Livingstone, Gabriel Kreiman

TL;DR

The study addresses how DNN-based encoding models generalize to Out-of-Distribution images when predicting macaque IT neural responses. It introduces MacaqueITBench, a large-scale dataset of neural responses to over 300k natural images across ventral-stream areas, enabling systematic OOD splits, and shows that simple distribution shifts degrade neural predictivity even for low-level attributes. Using a linear ridge-encoded mapping from pre-trained DNN features (across eight architectures) to IT firing rates, the authors quantify OOD generalization gaps and demonstrate that a simple distance metric on image representations, the Closest Cosine Distance $D_{CCD}$, robustly explains OOD loss better than $D_{MMD}$ or $D_{Cov}$. These results reveal a fundamental limitation of current encoding approaches and provide a practical metric and public benchmark to guide future data-efficient improvements in neural modeling.

Abstract

We characterized the generalization capabilities of DNN-based encoding models when predicting neuronal responses from the visual cortex. We collected \textit{MacaqueITBench}, a large-scale dataset of neural population responses from the macaque inferior temporal (IT) cortex to over $300,000$ images, comprising $8,233$ unique natural images presented to seven monkeys over $109$ sessions. Using \textit{MacaqueITBench}, we investigated the impact of distribution shifts on models predicting neural activity by dividing the images into Out-Of-Distribution (OOD) train and test splits. The OOD splits included several different image-computable types including image contrast, hue, intensity, temperature, and saturation. Compared to the performance on in-distribution test images -- the conventional way these models have been evaluated -- models performed worse at predicting neuronal responses to out-of-distribution images, retaining as little as $20\%$ of the performance on in-distribution test images. The generalization performance under OOD shifts can be well accounted by a simple image similarity metric -- the cosine distance between image representations extracted from a pre-trained object recognition model is a strong predictor of neural predictivity under different distribution shifts. The dataset of images, neuronal firing rate recordings, and computational benchmarks are hosted publicly at: https://bit.ly/3zeutVd.

Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex

TL;DR

The study addresses how DNN-based encoding models generalize to Out-of-Distribution images when predicting macaque IT neural responses. It introduces MacaqueITBench, a large-scale dataset of neural responses to over 300k natural images across ventral-stream areas, enabling systematic OOD splits, and shows that simple distribution shifts degrade neural predictivity even for low-level attributes. Using a linear ridge-encoded mapping from pre-trained DNN features (across eight architectures) to IT firing rates, the authors quantify OOD generalization gaps and demonstrate that a simple distance metric on image representations, the Closest Cosine Distance , robustly explains OOD loss better than or . These results reveal a fundamental limitation of current encoding approaches and provide a practical metric and public benchmark to guide future data-efficient improvements in neural modeling.

Abstract

We characterized the generalization capabilities of DNN-based encoding models when predicting neuronal responses from the visual cortex. We collected \textit{MacaqueITBench}, a large-scale dataset of neural population responses from the macaque inferior temporal (IT) cortex to over images, comprising unique natural images presented to seven monkeys over sessions. Using \textit{MacaqueITBench}, we investigated the impact of distribution shifts on models predicting neural activity by dividing the images into Out-Of-Distribution (OOD) train and test splits. The OOD splits included several different image-computable types including image contrast, hue, intensity, temperature, and saturation. Compared to the performance on in-distribution test images -- the conventional way these models have been evaluated -- models performed worse at predicting neuronal responses to out-of-distribution images, retaining as little as of the performance on in-distribution test images. The generalization performance under OOD shifts can be well accounted by a simple image similarity metric -- the cosine distance between image representations extracted from a pre-trained object recognition model is a strong predictor of neural predictivity under different distribution shifts. The dataset of images, neuronal firing rate recordings, and computational benchmarks are hosted publicly at: https://bit.ly/3zeutVd.

Paper Structure

This paper contains 16 sections, 4 equations, 10 figures.

Figures (10)

  • Figure 1: Modeling the visual cortex with MacaqueITBench. (a) DNN-Based models of the visual cortex employ a linear model to map image features extracted from pre-trained DNNs (e.g., ResNet18) to neuronal responses collected from the macaque IT cortex. (b) A UMAP mcinnes2018umap visualization of the representation by the neural pseudo-population. Nearby images have more similar population responses. (c) An example one-second segment of the raw wideband signals recorded on an electrode. (d), The wideband signals were highpass filtered, and threshold-crossing events below a voltage value (horizontal dashed line) were counted as multiunit spikes (lower vertical ticks). The top horizontal bars indicate image presentation periods. (e) The heatmap shows the neural response matrix. Each row indicates the responses from an electrode, pooled across sessions. The columns correspond to images, sorted by the reverse UMAP horizontal order. The vertical bars to the left of the heatmap denote the recorded areas (black lines) and monkeys (colored lines).
  • Figure 2: Constructing multiple attribute-based OOD splits. For each of our $109$ sessions, we construct $15$ different attribute-based OOD splits. These correspond to $3$ hold-out strategies (high, low, mid) for each of $5$ image-computable attributes (hue, contrast, saturation, intensity, temperature). For each attribute (e.g., hue), we compute the attribute value for each image in the session. For the high hold-out strategy, all images with the attribute value above a percentile cut-off serve as the OOD test set with the remaining serving as the train set. Analogously for the low hold-out splits, images below a percentile cut-off serve as the test set with the remaining serving as the train set. For mid hold-out splits, images within the middle percentiles serve as the test set.
  • Figure 3: Neural predictivity drops under distribution shifts. The y-axis shows the ratio of the neural predictivity for out-of-distribution (OOD) images to in-distribution (InD) test images. A ratio of $1$ would indicate no drop in performance. Each panel (a-h) shows a different architecture used for extracting image features. Each bar in a panels corresponds to a different OOD split constructed by using the high hold-out strategy across $5$ different attributes (hue, saturation, saturation, intensity, temperature, and contrast). For all architectures and OOD splits, models fail to generalize well to OOD samples and are significantly and substantially below the $1.0$ horizontal line. Image features were extracted from the pre-final layer for all architectures.
  • Figure 4: Neural predictivity drops for different model layers as well. Neural predictivity on OOD samples is reported for multiple DNN architectures across multiple different layers. Layer name is mentioned alongside architecture in all panels (a-h). All OOD splits reported here were constructed using the high hold-out strategy. For all architectures, layers, and OOD splits, models fail to generalize well to OOD samples and are significantly below the $1.0$ horizontal line.
  • Figure 5: Neural predictivity drops for the low hold-out strategy as well. Neural predictivity is reported on OOD test splits constructed using the low hold-out strategy. Across all DNN architectures and image-computable attributes, performance is below 1.0 for all panels (a-h). Thus, models do not generalize well to OOD splits constructed with the low hold-out strategy as well.
  • ...and 5 more figures