Table of Contents
Fetching ...

BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?

David Mayo, Christopher Wang, Asa Harbin, Abdulrahman Alabdulkareem, Albert Eaton Shaw, Boris Katz, Andrei Barbu

TL;DR

BrainBits is introduced, a method that uses a bottleneck to quantify the amount of signal extracted from neural recordings that is actually necessary to reproduce a method's reconstruction fidelity, and it is found that it takes surprisingly little information from the brain to produce reconstructions with high fidelity.

Abstract

When evaluating stimuli reconstruction results it is tempting to assume that higher fidelity text and image generation is due to an improved understanding of the brain or more powerful signal extraction from neural recordings. However, in practice, new reconstruction methods could improve performance for at least three other reasons: learning more about the distribution of stimuli, becoming better at reconstructing text or images in general, or exploiting weaknesses in current image and/or text evaluation metrics. Here we disentangle how much of the reconstruction is due to these other factors vs. productively using the neural recordings. We introduce BrainBits, a method that uses a bottleneck to quantify the amount of signal extracted from neural recordings that is actually necessary to reproduce a method's reconstruction fidelity. We find that it takes surprisingly little information from the brain to produce reconstructions with high fidelity. In these cases, it is clear that the priors of the methods' generative models are so powerful that the outputs they produce extrapolate far beyond the neural signal they decode. Given that reconstructing stimuli can be improved independently by either improving signal extraction from the brain or by building more powerful generative models, improving the latter may fool us into thinking we are improving the former. We propose that methods should report a method-specific random baseline, a reconstruction ceiling, and a curve of performance as a function of bottleneck size, with the ultimate goal of using more of the neural recordings.

BrainBits: How Much of the Brain are Generative Reconstruction Methods Using?

TL;DR

BrainBits is introduced, a method that uses a bottleneck to quantify the amount of signal extracted from neural recordings that is actually necessary to reproduce a method's reconstruction fidelity, and it is found that it takes surprisingly little information from the brain to produce reconstructions with high fidelity.

Abstract

When evaluating stimuli reconstruction results it is tempting to assume that higher fidelity text and image generation is due to an improved understanding of the brain or more powerful signal extraction from neural recordings. However, in practice, new reconstruction methods could improve performance for at least three other reasons: learning more about the distribution of stimuli, becoming better at reconstructing text or images in general, or exploiting weaknesses in current image and/or text evaluation metrics. Here we disentangle how much of the reconstruction is due to these other factors vs. productively using the neural recordings. We introduce BrainBits, a method that uses a bottleneck to quantify the amount of signal extracted from neural recordings that is actually necessary to reproduce a method's reconstruction fidelity. We find that it takes surprisingly little information from the brain to produce reconstructions with high fidelity. In these cases, it is clear that the priors of the methods' generative models are so powerful that the outputs they produce extrapolate far beyond the neural signal they decode. Given that reconstructing stimuli can be improved independently by either improving signal extraction from the brain or by building more powerful generative models, improving the latter may fool us into thinking we are improving the former. We propose that methods should report a method-specific random baseline, a reconstruction ceiling, and a curve of performance as a function of bottleneck size, with the ultimate goal of using more of the neural recordings.

Paper Structure

This paper contains 22 sections, 1 equation, 16 figures, 3 tables.

Figures (16)

  • Figure 1: BrainBits bottlenecking framework as applied to BrainDiffuser. The goal of image reconstruction is to generate an image based on brain signal. The brain signal is mapped to a hidden vector (gold) by a compression mapping $g_L$, which is then used to predict VDVAE, CLIP-text, and CLIP-vision latents via a mapping $f_L$. As in ozcelik2022reconstruction, these latents are used to produce the final reconstruction. In our studies, we restrict the information available from the brain by varying the dimension of the hidden vector.
  • Figure 2: High quality stimuli can be reconstructed from a fraction of the data. Shown here are images and text reconstructed for several bottleneck sizes using our BrainBits approach. Images and text are shown for subject 1 for all three methods. Examples where the original methods could reasonably reconstruct the stimuli were chosen; the same images for both visual methods are shown in the appendix. As the bottleneck dimension increases, the accuracy of the reconstruction increases. Although there are differences between the full and bottlenecked ($d=50$) results, the reconstructions are surprisingly comparable, despite the fact that the full reconstruction methods have $>14,000$ voxels available to them. Text reconstructions are harder to evaluate in this qualitative manner, later we present a quantitative evaluation.
  • Figure 4: How large are the bottlenecks? Even though the bottleneck representations have $L$ dimensions, it is not necessarily the case that all dimensions are used by the bottleneck mapping. For both language and vision, we can measure the effective dimensionality to get a sense for how much of the channel capacity is being used. For BrainDiffuser, the effective dimensionality is comparable to the bottleneck size, showing that information is being extracted from the neural recordings up to about 15-20 dimensions For language bottlenecks, effective dimensionality remains low showing that little of the channel capacity, and therefore little of the neural signal, is being used.
  • Figure 5: What areas of the brain help reconstruction the most? Models quickly zoom in on useful areas even at low bottleneck sizes. Note that for clarity the color bar cuts off at 1e-6, values above that are all orange. In this case BrainDiffuser on subject 1 attends to peripheral areas of the early visual system. As the bottleneck size goes up models exploit those original areas but do not meaningfully expand to new areas. Ideally, one would hope to see more of the brain playing an important role with larger bottleneck sizes; this is not what BrainBits uncovers.
  • Figure 6: What information do bottlenecks contain? For the BrainDiffusers approach we compute the decodability of four different features (object class, brightness, RMS contrast, and the average gradient magnitude) as a function of bottleneck size. Object class refers to decoding the class of the largest object in the image; often the focus of the image. The average gradient magnitude is a proxy for the edge energy in the image. Dashed lines in plot (a) indicate 1-out-of-61 classification chance, 1.6%. Dashed lines on plots (b, c, d) indicate the metric's MSE distance from the average metric value on the training set. Larger bottlenecks are needed to extract more object class information above chance. Edge energy, brightness and contrast are mostly exhausted early. Looking at features as a function of bottleneck size can reveal what types of interpretable features models learn, offering some explanation as to why performance goes up as a function of bottleneck size.
  • ...and 11 more figures