Table of Contents
Fetching ...

Dynadiff: Single-stage Decoding of Images from Continuously Evolving fMRI

Marlène Careil, Yohann Benchetrit, Jean-Rémi King

TL;DR

Dynadiff introduces a single-stage diffusion-based decoder that directly uses continuously evolving fMRI time series to reconstruct natural images, addressing the limitations of time-collapsed preprocessing and multi-stage pipelines. A brain module maps fMRI sequences to conditioning embeddings, which condition a pretrained diffusion model trained jointly in one stage; inference uses a DDIM scheduler for efficient denoising. On the Natural Scenes Dataset, Dynadiff delivers state-of-the-art time-resolved reconstructions, with strong gains on high-level semantics and clear evidence that time-aware decoding reveals dynamic evolution of image representations in brain activity. The work offers a practical, time-resolved brain-to-image decoding approach with implications for neuroprosthetics and neuroscience, while outlining ethical safeguards such as face-blurring and open research practices.

Abstract

Brain-to-image decoding has been recently propelled by the progress in generative AI models and the availability of large ultra-high field functional Magnetic Resonance Imaging (fMRI). However, current approaches depend on complicated multi-stage pipelines and preprocessing steps that typically collapse the temporal dimension of brain recordings, thereby limiting time-resolved brain decoders. Here, we introduce Dynadiff (Dynamic Neural Activity Diffusion for Image Reconstruction), a new single-stage diffusion model designed for reconstructing images from dynamically evolving fMRI recordings. Our approach offers three main contributions. First, Dynadiff simplifies training as compared to existing approaches. Second, our model outperforms state-of-the-art models on time-resolved fMRI signals, especially on high-level semantic image reconstruction metrics, while remaining competitive on preprocessed fMRI data that collapse time. Third, this approach allows a precise characterization of the evolution of image representations in brain activity. Overall, this work lays the foundation for time-resolved brain-to-image decoding.

Dynadiff: Single-stage Decoding of Images from Continuously Evolving fMRI

TL;DR

Dynadiff introduces a single-stage diffusion-based decoder that directly uses continuously evolving fMRI time series to reconstruct natural images, addressing the limitations of time-collapsed preprocessing and multi-stage pipelines. A brain module maps fMRI sequences to conditioning embeddings, which condition a pretrained diffusion model trained jointly in one stage; inference uses a DDIM scheduler for efficient denoising. On the Natural Scenes Dataset, Dynadiff delivers state-of-the-art time-resolved reconstructions, with strong gains on high-level semantics and clear evidence that time-aware decoding reveals dynamic evolution of image representations in brain activity. The work offers a practical, time-resolved brain-to-image decoding approach with implications for neuroprosthetics and neuroscience, while outlining ethical safeguards such as face-blurring and open research practices.

Abstract

Brain-to-image decoding has been recently propelled by the progress in generative AI models and the availability of large ultra-high field functional Magnetic Resonance Imaging (fMRI). However, current approaches depend on complicated multi-stage pipelines and preprocessing steps that typically collapse the temporal dimension of brain recordings, thereby limiting time-resolved brain decoders. Here, we introduce Dynadiff (Dynamic Neural Activity Diffusion for Image Reconstruction), a new single-stage diffusion model designed for reconstructing images from dynamically evolving fMRI recordings. Our approach offers three main contributions. First, Dynadiff simplifies training as compared to existing approaches. Second, our model outperforms state-of-the-art models on time-resolved fMRI signals, especially on high-level semantic image reconstruction metrics, while remaining competitive on preprocessed fMRI data that collapse time. Third, this approach allows a precise characterization of the evolution of image representations in brain activity. Overall, this work lays the foundation for time-resolved brain-to-image decoding.

Paper Structure

This paper contains 18 sections, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Schematic bird's-eye view of four seminal fMRI-to-image architectures: Brain-Diffusers ozcelik2023brain, MindEye1 scotti2023reconstructing, WAVE wang2024wave, MindEye2 scotti2024mindeye2. They all consist of multiple independent training modules, and can't be trained in a single stage. Except WAVE, they use a preprocessing of fMRI data which collapses time. We illustrate the simplicity and time-resolved capability of our approach, Dynadiff, trained in a single stage on timeseries of fMRI activity, in comparison to these pipelines.
  • Figure 2: The architecture of our brain module, corresponding to the only MLP block of our approach (\ref{['fig:archi']}).
  • Figure 3: Qualitative comparisons of Wave, MindEye1, MindEye2 and our model on the NSD dataset. We display the image stimuli on the left column and the next columns show WAVE, MindEye1, MindEye2 and our model successively.
  • Figure 4: Real-time decoding of images using our specialized or general models Dynadiff. The "General" model $M_{gen}$ is trained on time windows $W(s,t,d)$ (with $t=3$ s and duration $d=8$ s) and we evaluate its generalization capabilities by reconstructing images from shifted windows $W(s,t+\delta,d)$. In the "Specialized" setting , we train a separate model for each shift $t+\delta$ on windows $W(s,t+\delta,d)$. This means that each column corresponds to a different model. Since participants see a stimulus every 4 seconds, $\delta = - 3\cdot\text{TR}$ and $\delta = 3\cdot\text{TR}$ correspond to the windows of the previous and next image presentations respectively. As expected, $M_{gen}$ can decode these images quite well.
  • Figure 5: Each point is obtained by reconstructing images from a different fMRI time window $W(s,t+\delta,d)$. We fix $t=3$ s and duration $d=8$ s and vary $\delta$ as explained in \ref{['results']}. The x-axis represents the end time of time window, i.e., $t+\delta + d$ Orange curve is obtained with specialized models trained specifically for each time window $W(s,t+\delta,d)$ while the blue curve displays the performance results of a general model trained at $W(s,t,d)$ and evaluated at shifted test windows. We provide standard error of the mean on the four NSD subjects. The shaded gray area indicates the 3 sec. interval during which images were presented to the participants.
  • ...and 4 more figures