Connecting Dreams with Visual Brainstorming Instruction
Yasheng Sun, Bohan Li, Mingchen Zhuge, Deng-Ping Fan, Salman Khan, Fahad Shahbaz Khan, Hideki Koike
TL;DR
This work addresses enabling interactive control of brain-derived visual content by translating fMRI signals into imagery that can be steered with natural language. It introduces DreamConnect, a dual-stream diffusion framework with an adaptor, an asynchronous diffusion strategy, and LLM-guided region-aware manipulation to map brain activity to intention-driven edits. The approach leverages Stable Diffusion-like backbones and CLIP-based embeddings, trained in a two-stage process on the NSD dataset, and demonstrates competitive reconstruction performance and superior instruction-following in qualitative and quantitative tests, with comprehensive ablations. The work highlights potential for multimodal brain-computer interfaces and acknowledges ethical considerations and limitations, pointing to future work on internal dreams, small-object edits, and multi-turn interactions.
Abstract
Recent breakthroughs in understanding the human brain have revealed its impressive ability to efficiently process and interpret human thoughts, opening up possibilities for intervening in brain signals. In this paper, we aim to develop a straightforward framework that uses other modalities, such as natural language, to translate the original dreamland. We present DreamConnect, employing a dual-stream diffusion framework to manipulate visually stimulated brain signals. By integrating an asynchronous diffusion strategy, our framework establishes an effective interface with human dreams, progressively refining their final imagery synthesis. Through extensive experiments, we demonstrate the method ability to accurately instruct human brain signals with high fidelity. Our project will be publicly available on https://github.com/Sys-Nexus/DreamConnect
