Overcoming Output Dimension Collapse: When Sparsity Enables Zero-shot Brain-to-Image Reconstruction at Small Data Scales
Kenya Otsuka, Yoshihiro Nagano, Yukiyasu Kamitani
TL;DR
This paper tackles the challenge of zero-shot brain-to-image reconstruction under severe data scarcity by comparing naive and sparse linear translators within a Translator–Generator pipeline. It shows that naive regression inherently suffers output dimension collapse (ODC) at small data scales, confining predictions to the span of training outputs and creating irreducible latent-feature error. By contrast, sparse brain-to-feature mappings can extend predictions beyond this subspace, and the authors derive analytic expressions for prediction error in a student–teacher setting, highlighting when sparsity yields gains. Empirical analysis on real fMRI data (Deeprecon) confirms ODC as a substantial contributor to prediction error but also demonstrates that gains from sparsity are context-dependent, improved by lower noise and appropriately structured feature representations. Collectively, the work provides quantitative diagnostics to diagnose ODC and actionable guidelines for translator design and measurement strategies to enable more robust zero-shot brain decoding.
Abstract
Advances in brain-to-image reconstruction are enabling us to externalize the subjective visual experiences encoded in the brain as images. A key challenge in this task is data scarcity: a translator that maps brain activity to latent image features is trained on a limited number of brain-image pairs, making the translator a bottleneck for zero-shot reconstruction beyond the training stimuli. In this paper, we provide a theoretical analysis of two translator designs widely used in recent reconstruction pipelines: naive multivariate linear regression and sparse multivariate linear regression. We define the data scale as the ratio of the number of training samples to the latent feature dimensionality and characterize the behavior of each model across data scales. We first show that the naive linear regression model, which uses a shared set of input variables for all outputs, suffers from ``output dimension collapse'' at small data scales, restricting generalization beyond the training data. We then analyze sparse linear regression models in a student--teacher framework and derive expressions for the prediction error in terms of data scale and other sparsity-related parameters. Our analysis clarifies when variable selection can reduce prediction error at small data scales by exploiting the sparsity of the brain-to-feature mapping. Our findings provide quantitative guidelines for diagnosing output dimension collapse and for designing effective translators and feature representations for zero-shot reconstruction.
