DREAM: Visual Decoding from Reversing Human Visual System
Weihao Xia, Raoul de Charette, Cengiz Öztireli, Jing-Hao Xue
TL;DR
DREAM addresses the challenge of reconstructing viewed scenes from fMRI by grounding visual decoding in the forward biology of the human visual system. It introduces two reverse pathways—R-VAC for semantics and R-PKM for color and depth—followed by Guided Image Reconstruction (GIR) that conditions a frozen Stable Diffusion model via T2I-Adapter guidance. The approach leverages contrastive learning to align fMRI with CLIP embeddings and employs multi-stage RGBD training with cycle-consistency to recover depth and color cues, which together yield more consistent appearance, structure, and semantics than prior methods. Evaluations on the NSD dataset show competitive or superior performance across both low- and high-level metrics, and ablations confirm the value of color guidance and data-scarcity strategies. The work offers a biologically inspired, modular framework for fMRI-to-image reconstruction with practical potential for neuroscience, assistive technology, and brain-computer interfaces, and provides publicly available code for future research.
Abstract
In this work we present DREAM, an fMRI-to-image method for reconstructing viewed images from brain activities, grounded on fundamental knowledge of the human visual system. We craft reverse pathways that emulate the hierarchical and parallel nature of how humans perceive the visual world. These tailored pathways are specialized to decipher semantics, color, and depth cues from fMRI data, mirroring the forward pathways from visual stimuli to fMRI recordings. To do so, two components mimic the inverse processes within the human visual system: the Reverse Visual Association Cortex (R-VAC) which reverses pathways of this brain region, extracting semantics from fMRI data; the Reverse Parallel PKM (R-PKM) component simultaneously predicting color and depth from fMRI signals. The experiments indicate that our method outperforms the current state-of-the-art models in terms of the consistency of appearance, structure, and semantics. Code will be made publicly available to facilitate further research in this field.
