Table of Contents
Fetching ...

Uni-Neur2Img: Unified Neural Signal-Guided Image Generation, Editing, and Stylization via Diffusion Transformers

Xiyue Bai, Ronghao Yu, Jia Xiu, Pengfei Zhou, Jie Xia, Peng Ji

TL;DR

Uni-Neur2Img introduces a unified diffusion-transformer framework that directly conditions image generation, editing, and stylization on neural signals (EEG). It leverages a LoRA-based neural-signal injection module and a causal mutual-attention mechanism to enable flexible, parameter-efficient multi-modal conditioning without retraining the base model, and it introduces the EEG-Style dataset for EEG-driven style transfer. The approach delivers state-of-the-art results across EEG-driven generation, editing, and stylization on three datasets, with strong fidelity, semantic alignment, and efficiency, demonstrating that EEG encodes mid-to-low level visual and stylistic priors that textual prompts alone struggle to capture. This work advances neuroscience-vision interfaces for assistive and creative applications while addressing important safety and ethical considerations around neural-data privacy and responsible usage.

Abstract

Generating or editing images directly from Neural signals has immense potential at the intersection of neuroscience, vision, and Brain-computer interaction. In this paper, We present Uni-Neur2Img, a unified framework for neural signal-driven image generation and editing. The framework introduces a parameter-efficient LoRA-based neural signal injection module that independently processes each conditioning signal as a pluggable component, facilitating flexible multi-modal conditioning without altering base model parameters. Additionally, we employ a causal attention mechanism accommodate the long-sequence modeling demands of conditional generation tasks. Existing neural-driven generation research predominantly focuses on textual modalities as conditions or intermediate representations, resulting in limited exploration of visual modalities as direct conditioning signals. To bridge this research gap, we introduce the EEG-Style dataset. We conduct comprehensive evaluations across public benchmarks and self-collected neural signal datasets: (1) EEG-driven image generation on the public CVPR40 dataset; (2) neural signal-guided image editing on the public Loongx dataset for semantic-aware local modifications; and (3) EEG-driven style transfer on our self-collected EEG-Style dataset. Extensive experimental results demonstrate significant improvements in generation fidelity, editing consistency, and style transfer quality while maintaining low computational overhead and strong scalability to additional modalities. Thus, Uni-Neur2Img offers a unified, efficient, and extensible solution for bridging neural signals and visual content generation.

Uni-Neur2Img: Unified Neural Signal-Guided Image Generation, Editing, and Stylization via Diffusion Transformers

TL;DR

Uni-Neur2Img introduces a unified diffusion-transformer framework that directly conditions image generation, editing, and stylization on neural signals (EEG). It leverages a LoRA-based neural-signal injection module and a causal mutual-attention mechanism to enable flexible, parameter-efficient multi-modal conditioning without retraining the base model, and it introduces the EEG-Style dataset for EEG-driven style transfer. The approach delivers state-of-the-art results across EEG-driven generation, editing, and stylization on three datasets, with strong fidelity, semantic alignment, and efficiency, demonstrating that EEG encodes mid-to-low level visual and stylistic priors that textual prompts alone struggle to capture. This work advances neuroscience-vision interfaces for assistive and creative applications while addressing important safety and ethical considerations around neural-data privacy and responsible usage.

Abstract

Generating or editing images directly from Neural signals has immense potential at the intersection of neuroscience, vision, and Brain-computer interaction. In this paper, We present Uni-Neur2Img, a unified framework for neural signal-driven image generation and editing. The framework introduces a parameter-efficient LoRA-based neural signal injection module that independently processes each conditioning signal as a pluggable component, facilitating flexible multi-modal conditioning without altering base model parameters. Additionally, we employ a causal attention mechanism accommodate the long-sequence modeling demands of conditional generation tasks. Existing neural-driven generation research predominantly focuses on textual modalities as conditions or intermediate representations, resulting in limited exploration of visual modalities as direct conditioning signals. To bridge this research gap, we introduce the EEG-Style dataset. We conduct comprehensive evaluations across public benchmarks and self-collected neural signal datasets: (1) EEG-driven image generation on the public CVPR40 dataset; (2) neural signal-guided image editing on the public Loongx dataset for semantic-aware local modifications; and (3) EEG-driven style transfer on our self-collected EEG-Style dataset. Extensive experimental results demonstrate significant improvements in generation fidelity, editing consistency, and style transfer quality while maintaining low computational overhead and strong scalability to additional modalities. Thus, Uni-Neur2Img offers a unified, efficient, and extensible solution for bridging neural signals and visual content generation.

Paper Structure

This paper contains 28 sections, 11 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Illustration of Uni-Neur2Img for Image Generation, Editing, and Stylization.
  • Figure 2: An overview of our proposed Unified Neural Signal-Guided Image Generation, Editing, and Stylization framework based on Diffusion Transformers, termed Uni-Neur2Img.
  • Figure 3: EEG-Driven Image Generation on the CVPR40 Dataset: Qualitative Comparison of Uni-Neur2Img with DreamDiffusion, GWIT, and Original Images.
  • Figure 4: Qualitative Comparison of Uni-Neur2Img on the LoongX Dataset: Original, Ground Truth, EEG+Text, and EEG-Only Results Across (a) Background, (b) Object, (c) Global, and (d) Text Editing.
  • Figure 5: Comparison of EEG-Based Style Transfer Results with Ground Truth Stylizations and Original Images on the EEG-Style Dataset.
  • ...and 6 more figures