Benchmarking Out-of-Distribution Generalization Capabilities of DNN-based Encoding Models for the Ventral Visual Cortex
Spandan Madan, Will Xiao, Mingran Cao, Hanspeter Pfister, Margaret Livingstone, Gabriel Kreiman
TL;DR
The study addresses how DNN-based encoding models generalize to Out-of-Distribution images when predicting macaque IT neural responses. It introduces MacaqueITBench, a large-scale dataset of neural responses to over 300k natural images across ventral-stream areas, enabling systematic OOD splits, and shows that simple distribution shifts degrade neural predictivity even for low-level attributes. Using a linear ridge-encoded mapping from pre-trained DNN features (across eight architectures) to IT firing rates, the authors quantify OOD generalization gaps and demonstrate that a simple distance metric on image representations, the Closest Cosine Distance $D_{CCD}$, robustly explains OOD loss better than $D_{MMD}$ or $D_{Cov}$. These results reveal a fundamental limitation of current encoding approaches and provide a practical metric and public benchmark to guide future data-efficient improvements in neural modeling.
Abstract
We characterized the generalization capabilities of DNN-based encoding models when predicting neuronal responses from the visual cortex. We collected \textit{MacaqueITBench}, a large-scale dataset of neural population responses from the macaque inferior temporal (IT) cortex to over $300,000$ images, comprising $8,233$ unique natural images presented to seven monkeys over $109$ sessions. Using \textit{MacaqueITBench}, we investigated the impact of distribution shifts on models predicting neural activity by dividing the images into Out-Of-Distribution (OOD) train and test splits. The OOD splits included several different image-computable types including image contrast, hue, intensity, temperature, and saturation. Compared to the performance on in-distribution test images -- the conventional way these models have been evaluated -- models performed worse at predicting neuronal responses to out-of-distribution images, retaining as little as $20\%$ of the performance on in-distribution test images. The generalization performance under OOD shifts can be well accounted by a simple image similarity metric -- the cosine distance between image representations extracted from a pre-trained object recognition model is a strong predictor of neural predictivity under different distribution shifts. The dataset of images, neuronal firing rate recordings, and computational benchmarks are hosted publicly at: https://bit.ly/3zeutVd.
