Table of Contents
Fetching ...

Brain Dialogue Interface (BDI): A User-Friendly fMRI Model for Interactive Brain Decoding

Heng Huang, Lin Zhao, Zihao Wu, Xiaowei Yu, Jing Zhang, Xintao Hu, Dajiang Zhu, Tianming Liu

TL;DR

This work tackles the problem of aligning complex external stimuli with dynamic brain activity, a challenge for traditional static, target-based decoding. It introduces the Brain Dialogue Interface (BDI), a Transformer-based, unsupervised model that compresses whole-brain fMRI signals (~185,751 voxels) into 32 injection signals to enable interactive, dialogue-like querying of brain states. Key contributions include a target-less decoding framework; a flexible query system (temporal, spatial, and multimodal via CLIP) for extracting event timing and content; and evidence that injection signals provide meaningful, task-relevant brain representations with potential for zero-shot text/image driven brain inquiries. The integration of Neurosynth pretraining and CLIP-based translation expands accessibility and multimodal control, suggesting significant potential for dynamic brain-computer interfaces and human-in-the-loop neuroscience exploration.

Abstract

Brain decoding techniques are essential for understanding the neurocognitive system. Although numerous methods have been introduced in this field, accurately aligning complex external stimuli with brain activities remains a formidable challenge. To alleviate alignment difficulties, many studies have simplified their models by employing single-task paradigms and establishing direct links between brain/world through classification strategies. Despite improvements in decoding accuracy, this strategy frequently encounters issues with generality when adapting these models to various task paradigms. To address this issue, this study introduces a user-friendly decoding model that enables dynamic communication with the brain, as opposed to the static decoding approaches utilized by traditional studies. The model functions as a brain simulator, allowing for interactive engagement with the brain and enabling the decoding of a subject's experiences through dialogue-like queries. Uniquely, our model is trained in a completely unsupervised and task-free manner. Our experiments demonstrate the feasibility and versatility of our proposed method. Notably, our model demonstrates exceptional capabilities in signal compression, successfully representing the entire brain signal of approximately 185,751 voxels with just 32 signals. Furthermore, we show how our model can integrate seamlessly with multimodal models, thus enhancing the potential for controlling brain decoding through textual or image inputs.

Brain Dialogue Interface (BDI): A User-Friendly fMRI Model for Interactive Brain Decoding

TL;DR

This work tackles the problem of aligning complex external stimuli with dynamic brain activity, a challenge for traditional static, target-based decoding. It introduces the Brain Dialogue Interface (BDI), a Transformer-based, unsupervised model that compresses whole-brain fMRI signals (~185,751 voxels) into 32 injection signals to enable interactive, dialogue-like querying of brain states. Key contributions include a target-less decoding framework; a flexible query system (temporal, spatial, and multimodal via CLIP) for extracting event timing and content; and evidence that injection signals provide meaningful, task-relevant brain representations with potential for zero-shot text/image driven brain inquiries. The integration of Neurosynth pretraining and CLIP-based translation expands accessibility and multimodal control, suggesting significant potential for dynamic brain-computer interfaces and human-in-the-loop neuroscience exploration.

Abstract

Brain decoding techniques are essential for understanding the neurocognitive system. Although numerous methods have been introduced in this field, accurately aligning complex external stimuli with brain activities remains a formidable challenge. To alleviate alignment difficulties, many studies have simplified their models by employing single-task paradigms and establishing direct links between brain/world through classification strategies. Despite improvements in decoding accuracy, this strategy frequently encounters issues with generality when adapting these models to various task paradigms. To address this issue, this study introduces a user-friendly decoding model that enables dynamic communication with the brain, as opposed to the static decoding approaches utilized by traditional studies. The model functions as a brain simulator, allowing for interactive engagement with the brain and enabling the decoding of a subject's experiences through dialogue-like queries. Uniquely, our model is trained in a completely unsupervised and task-free manner. Our experiments demonstrate the feasibility and versatility of our proposed method. Notably, our model demonstrates exceptional capabilities in signal compression, successfully representing the entire brain signal of approximately 185,751 voxels with just 32 signals. Furthermore, we show how our model can integrate seamlessly with multimodal models, thus enhancing the potential for controlling brain decoding through textual or image inputs.
Paper Structure (16 sections, 10 equations, 19 figures, 9 tables, 1 algorithm)

This paper contains 16 sections, 10 equations, 19 figures, 9 tables, 1 algorithm.

Figures (19)

  • Figure 1: Overview of Our Decoding Framework. During the inference phase, the whole-brain activities (fMRI BOLD signals) of a subject we aim to decode are represented by 32 components, referred to as 'injection signals' in this study. These injection signals are then used to initialize the model. Once initialized, the model functions analogously to a real brain, enabling us to pose queries and engage in various types of inquiries. Specifically, we consider three scenarios based on whether we possess prior knowledge about the timing or content of events. The complexity of decoding increases with the scarcity of information. For each scenario, we design different types of queries, and the model responds accordingly. In addition to manually designed queries, we also explored the use of CLIP to automate the generation of queries and dialogue using straightforward textual or image inputs. For further details, please refer to Section III.
  • Figure 2: Schematic illustration of the model inputs. Part I displays the original form of the voxel signals, detailing both spatial location and temporal fluctuations. Part II shows the injection signals, including both the target brain locations for signal injection and the corresponding signal fluctuations.
  • Figure 3: Illustration of the construction process for the temporal query.
  • Figure 4: Query construction and brain responses in different decoding scenarios. (a) Fine-tuning the model to identify injection signals for a new subject, during which the model's weights are frozen. (b) Temporal query construction when the exact timing of an event is known, but the contents of the event are unknown. (c) Spatial query construction for known event contents when the exact timing is unknown. The spatial query acts like 'words' in a sentence, which we use to initiate a dialogue with the brain.
  • Figure 5: Illustration of Aligning Text/Image with Brain Activities for Spatial Query Translation. During the pretraining phase, Neurosynth data, which includes pairs of text and brain maps, is utilized to optimize the parameters of the spatial query layer. In the finetuning phase, the BDI model is integrated as a discriminator to enhance the accuracy of spatial query translation. During evaluation, a series of hypotheses are formulated to assess the brain states of subjects.
  • ...and 14 more figures