Table of Contents
Fetching ...

BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance

Ling Wang, Chen Wu, Lin Wang

TL;DR

This paper introduces BrainDreamer, a novel end-to-end language-guided generative framework that can mimic human reasoning and generate high-quality images from electroencephalogram (EEG) brain signals and significantly outperforms prior arts in terms of generating quality and quantitative performance.

Abstract

Can we directly visualize what we imagine in our brain together with what we describe? The inherent nature of human perception reveals that, when we think, our body can combine language description and build a vivid picture in our brain. Intuitively, generative models should also hold such versatility. In this paper, we introduce BrainDreamer, a novel end-to-end language-guided generative framework that can mimic human reasoning and generate high-quality images from electroencephalogram (EEG) brain signals. Our method is superior in its capacity to eliminate the noise introduced by non-invasive EEG data acquisition and meanwhile achieve a more precise mapping between the EEG and image modality, thus leading to significantly better-generated images. Specifically, BrainDreamer consists of two key learning stages: 1) modality alignment and 2) image generation. In the alignment stage, we propose a novel mask-based triple contrastive learning strategy to effectively align EEG, text, and image embeddings to learn a unified representation. In the generation stage, we inject the EEG embeddings into the pre-trained Stable Diffusion model by designing a learnable EEG adapter to generate high-quality reasoning-coherent images. Moreover, BrainDreamer can accept textual descriptions (e.g., color, position, etc.) to achieve controllable image generation. Extensive experiments show that our method significantly outperforms prior arts in terms of generating quality and quantitative performance.

BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance

TL;DR

This paper introduces BrainDreamer, a novel end-to-end language-guided generative framework that can mimic human reasoning and generate high-quality images from electroencephalogram (EEG) brain signals and significantly outperforms prior arts in terms of generating quality and quantitative performance.

Abstract

Can we directly visualize what we imagine in our brain together with what we describe? The inherent nature of human perception reveals that, when we think, our body can combine language description and build a vivid picture in our brain. Intuitively, generative models should also hold such versatility. In this paper, we introduce BrainDreamer, a novel end-to-end language-guided generative framework that can mimic human reasoning and generate high-quality images from electroencephalogram (EEG) brain signals. Our method is superior in its capacity to eliminate the noise introduced by non-invasive EEG data acquisition and meanwhile achieve a more precise mapping between the EEG and image modality, thus leading to significantly better-generated images. Specifically, BrainDreamer consists of two key learning stages: 1) modality alignment and 2) image generation. In the alignment stage, we propose a novel mask-based triple contrastive learning strategy to effectively align EEG, text, and image embeddings to learn a unified representation. In the generation stage, we inject the EEG embeddings into the pre-trained Stable Diffusion model by designing a learnable EEG adapter to generate high-quality reasoning-coherent images. Moreover, BrainDreamer can accept textual descriptions (e.g., color, position, etc.) to achieve controllable image generation. Extensive experiments show that our method significantly outperforms prior arts in terms of generating quality and quantitative performance.
Paper Structure (17 sections, 9 equations, 11 figures, 4 tables)

This paper contains 17 sections, 9 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Diverse generation and creation results of our BrainDreamer with text guidance. BrainDreamer can achieve high-quality, reasoning-coherent, and controllable image generation from different textual descriptions, such as ["Sunset"], ["Aurora borealis"].
  • Figure 2: Overview of our BrainDreamer. After aligning the EEG signals, images, and text using a mask-based triple contrastive learning strategy, we design an EEG adapter based on the trained EEG encoder. The EEG adapter employs the FiLM to modulate EEG embeddings. Then, the EEG and text embeddings are fed into pre-trained Stable Diffusion to generate reasoning-coherent images.
  • Figure 3: Mask-based triple contrastive learning. We leverage CLIP's image and text encoders to assist in training the EEG encoder. Also, during training, random masks are applied to both the image and EEG data to enhance feature robustness and reduce training costs.
  • Figure 4: Qualitative comparison with Brain2Image kavasidis2017brain2image, DreamDiffusion bai2023dreamdiffusion for EGG-to-Image generation (Setting 1).
  • Figure 5: Some results of directly generating images from EEG signals using BrainDreamer (Setting 1). The images on the left depict paired image data, while the three images on the right represent the sampling results.
  • ...and 6 more figures