Table of Contents
Fetching ...

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

Yunpeng Bai, Xintao Wang, Yan-pei Cao, Yixiao Ge, Chun Yuan, Ying Shan

TL;DR

DreamDiffusion shows that high-quality images can be generated directly from EEG signals by leveraging pre-trained text-to-image diffusion models. It addresses EEG-specific challenges with temporal masked signal pre-training and CLIP-based alignment to bridge EEG, text, and image spaces, enabling effective conditioning of Stable Diffusion even with limited EEG-image pairs. The approach yields promising qualitative and quantitative results, marking a step toward portable, low-cost thoughts-to-images applications in neuroscience and computer vision. The work demonstrates potential utility in neuroscience, psychotherapy, and human-computer interaction while acknowledging current category-level limitations.

Abstract

This paper introduces DreamDiffusion, a novel method for generating high-quality images directly from brain electroencephalogram (EEG) signals, without the need to translate thoughts into text. DreamDiffusion leverages pre-trained text-to-image models and employs temporal masked signal modeling to pre-train the EEG encoder for effective and robust EEG representations. Additionally, the method further leverages the CLIP image encoder to provide extra supervision to better align EEG, text, and image embeddings with limited EEG-image pairs. Overall, the proposed method overcomes the challenges of using EEG signals for image generation, such as noise, limited information, and individual differences, and achieves promising results. Quantitative and qualitative results demonstrate the effectiveness of the proposed method as a significant step towards portable and low-cost ``thoughts-to-image'', with potential applications in neuroscience and computer vision. The code is available here \url{https://github.com/bbaaii/DreamDiffusion}.

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

TL;DR

DreamDiffusion shows that high-quality images can be generated directly from EEG signals by leveraging pre-trained text-to-image diffusion models. It addresses EEG-specific challenges with temporal masked signal pre-training and CLIP-based alignment to bridge EEG, text, and image spaces, enabling effective conditioning of Stable Diffusion even with limited EEG-image pairs. The approach yields promising qualitative and quantitative results, marking a step toward portable, low-cost thoughts-to-images applications in neuroscience and computer vision. The work demonstrates potential utility in neuroscience, psychotherapy, and human-computer interaction while acknowledging current category-level limitations.

Abstract

This paper introduces DreamDiffusion, a novel method for generating high-quality images directly from brain electroencephalogram (EEG) signals, without the need to translate thoughts into text. DreamDiffusion leverages pre-trained text-to-image models and employs temporal masked signal modeling to pre-train the EEG encoder for effective and robust EEG representations. Additionally, the method further leverages the CLIP image encoder to provide extra supervision to better align EEG, text, and image embeddings with limited EEG-image pairs. Overall, the proposed method overcomes the challenges of using EEG signals for image generation, such as noise, limited information, and individual differences, and achieves promising results. Quantitative and qualitative results demonstrate the effectiveness of the proposed method as a significant step towards portable and low-cost ``thoughts-to-image'', with potential applications in neuroscience and computer vision. The code is available here \url{https://github.com/bbaaii/DreamDiffusion}.
Paper Structure (15 sections, 3 equations, 7 figures, 1 table)

This paper contains 15 sections, 3 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Our proposed DreamDiffusion is capable of generating high-quality images directly from brain electroencephalogram (EEG) signals, without the need to translate thoughts into text.
  • Figure 2: Overview of DreamDiffusion. Our method comprises three main components: 1) masked signal pre-training for an effective and robust EEG encoder, 2) fine-tuning with limited EEG-image pairs with pre-trained Stable Diffusion, and 3) aligning the EEG, text, and image spaces using CLIP encoders.
  • Figure 3: Masked signals modeling with large-scale noisy EEG data. We visualize the reconstruction results of one channel from the EEG data. We can observe that the overall trend is accurate, but the details are influenced by the dataset, as the EEG signals in these datasets are relatively noisy.
  • Figure 4: Main results. The images on the left depict paired image data, while the three images on the right represent the sampling results. It can be observed that our model generates images of high quality from the EEG data, and these images match the EEG data accurately.
  • Figure 5: Comparison with Brain2Image. The quality of the generated images produced by DreamDiffusion is significantly higher than those generated by Brain2Image.
  • ...and 2 more figures