Table of Contents
Fetching ...

DREAM: Visual Decoding from Reversing Human Visual System

Weihao Xia, Raoul de Charette, Cengiz Öztireli, Jing-Hao Xue

TL;DR

DREAM addresses the challenge of reconstructing viewed scenes from fMRI by grounding visual decoding in the forward biology of the human visual system. It introduces two reverse pathways—R-VAC for semantics and R-PKM for color and depth—followed by Guided Image Reconstruction (GIR) that conditions a frozen Stable Diffusion model via T2I-Adapter guidance. The approach leverages contrastive learning to align fMRI with CLIP embeddings and employs multi-stage RGBD training with cycle-consistency to recover depth and color cues, which together yield more consistent appearance, structure, and semantics than prior methods. Evaluations on the NSD dataset show competitive or superior performance across both low- and high-level metrics, and ablations confirm the value of color guidance and data-scarcity strategies. The work offers a biologically inspired, modular framework for fMRI-to-image reconstruction with practical potential for neuroscience, assistive technology, and brain-computer interfaces, and provides publicly available code for future research.

Abstract

In this work we present DREAM, an fMRI-to-image method for reconstructing viewed images from brain activities, grounded on fundamental knowledge of the human visual system. We craft reverse pathways that emulate the hierarchical and parallel nature of how humans perceive the visual world. These tailored pathways are specialized to decipher semantics, color, and depth cues from fMRI data, mirroring the forward pathways from visual stimuli to fMRI recordings. To do so, two components mimic the inverse processes within the human visual system: the Reverse Visual Association Cortex (R-VAC) which reverses pathways of this brain region, extracting semantics from fMRI data; the Reverse Parallel PKM (R-PKM) component simultaneously predicting color and depth from fMRI signals. The experiments indicate that our method outperforms the current state-of-the-art models in terms of the consistency of appearance, structure, and semantics. Code will be made publicly available to facilitate further research in this field.

DREAM: Visual Decoding from Reversing Human Visual System

TL;DR

DREAM addresses the challenge of reconstructing viewed scenes from fMRI by grounding visual decoding in the forward biology of the human visual system. It introduces two reverse pathways—R-VAC for semantics and R-PKM for color and depth—followed by Guided Image Reconstruction (GIR) that conditions a frozen Stable Diffusion model via T2I-Adapter guidance. The approach leverages contrastive learning to align fMRI with CLIP embeddings and employs multi-stage RGBD training with cycle-consistency to recover depth and color cues, which together yield more consistent appearance, structure, and semantics than prior methods. Evaluations on the NSD dataset show competitive or superior performance across both low- and high-level metrics, and ablations confirm the value of color guidance and data-scarcity strategies. The work offers a biologically inspired, modular framework for fMRI-to-image reconstruction with practical potential for neuroscience, assistive technology, and brain-computer interfaces, and provides publicly available code for future research.

Abstract

In this work we present DREAM, an fMRI-to-image method for reconstructing viewed images from brain activities, grounded on fundamental knowledge of the human visual system. We craft reverse pathways that emulate the hierarchical and parallel nature of how humans perceive the visual world. These tailored pathways are specialized to decipher semantics, color, and depth cues from fMRI data, mirroring the forward pathways from visual stimuli to fMRI recordings. To do so, two components mimic the inverse processes within the human visual system: the Reverse Visual Association Cortex (R-VAC) which reverses pathways of this brain region, extracting semantics from fMRI data; the Reverse Parallel PKM (R-PKM) component simultaneously predicting color and depth from fMRI signals. The experiments indicate that our method outperforms the current state-of-the-art models in terms of the consistency of appearance, structure, and semantics. Code will be made publicly available to facilitate further research in this field.
Paper Structure (33 sections, 11 equations, 14 figures, 5 tables)

This paper contains 33 sections, 11 equations, 14 figures, 5 tables.

Figures (14)

  • Figure 1: Forward and Reverse Cycle. Forward (HVS): visual stimuli $\mapsto$ color, depth, semantics $\mapsto$ fMRI; Reverse (DREAM): fMRI $\mapsto$ color, depth, semantics $\mapsto$ reconstructed images.
  • Figure 1: Functional Anatomy of Cortex. The functional localization in the human brain is based on findings from functional brain imaging, which link various anatomical regions of the brain to their associated functions. Source: https://upload.wikimedia.org/wikipedia/commons/d/db/Constudproc.png. This image is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
  • Figure 2: Appearance Inconsistency. When decoding the fMRI data of a subject viewing a test image (top), recent visual decoding methods, here ozcelik2023brain, reconstruct images (bottom) which are semantically close but still suffer from strong color inconsistencies.
  • Figure 2: Depth and Color Representations. We present pseudo ground truth samples of Depth (MiDaS prediction ranftl2020towards) and Color ($\times{}64$ downsampling of the test image) for a NSD input image.
  • Figure 3: Relation of the HVS and Our proposed DREAM. Grounding on the Human Visual System (HVS), we devise reverse pathways aimed at deciphering semantics, depth, and color cues from fMRI to guide image reconstruction. (Left) Schematic view of HVS, detailed in \ref{['sec:preliminary']}. When perceiving visual stimuli, connections from the retina to the brain can be separated into two parallel pathways. The Parvocellular Pathway originates from midget cells in the retina and is responsible for transmitting color information, while the Magnocellular Pathway starts with parasol cells and is specialized in detecting depth and motion. The conveyed information is channeled into the visual cortex for undertaking intricate processing of high-level semantics from the visual image. (Right) DREAM mimics the corresponding inverse processes within the HVS: the Reverse VAC (\ref{['subsec:method_vac']}) replicates the opposite operations of this brain region, analogously extracting semantics $\hat{\texttt{S}}$ as a form of CLIP embedding from fMRI; and the Reverse PKM (\ref{['subsec:method_pkm']}) maps fMRI to color $\hat{\texttt{C}}$ and depth $\hat{\texttt{D}}$ in the form of spatial palettes and depth maps to facilitate subsequent processing by the Color Adapter (C-A) and the Depth Adapter (D-A) in T2I-Adapter mou2023t2i in conjunction with SD rombach2022high for image reconstruction from deciphered semantics, color, and depth cues $\{\hat{\texttt{S}}, \hat{\texttt{C}}, \hat{\texttt{D}}\}$.
  • ...and 9 more figures