NeuroVolve: Evolving Visual Stimuli toward Programmable Neural Objectives
Haomiao Chen, Keith W Jamison, Mert R. Sabuncu, Amy Kuceyeski
TL;DR
NeuroVolve presents a flexible framework for brain-guided image synthesis that optimizes in the embedding space of a vision-language model to satisfy programmable neural objectives across brain regions. By coupling a voxelwise encoding model with embedding-space optimization and a diffusion-based image generator, it yields semantic trajectories that unify image editing and preferred-stimulus generation while capturing subject-specific neural tuning. The approach recovers known region selectivity, reveals low-level feature tuning in early visual areas, and enables complex, multi-region constraints such as co-activation and suppression. This data-driven, personalized synthesis tool offers a new avenue for mapping visual representations and potential brain-computer interface applications.
Abstract
What visual information is encoded in individual brain regions, and how do distributed patterns combine to create their neural representations? Prior work has used generative models to replicate known category selectivity in isolated regions (e.g., faces in FFA), but these approaches offer limited insight into how regions interact during complex, naturalistic vision. We introduce NeuroVolve, a generative framework that provides brain-guided image synthesis via optimization of a neural objective function in the embedding space of a pretrained vision-language model. Images are generated under the guidance of a programmable neural objective, i.e., activating or deactivating single regions or multiple regions together. NeuroVolve is validated by recovering known selectivity for individual brain regions, while expanding to synthesize coherent scenes that satisfy complex, multi-region constraints. By tracking optimization steps, it reveals semantic trajectories through embedding space, unifying brain-guided image editing and preferred stimulus generation in a single process. We show that NeuroVolve can generate both low-level and semantic feature-specific stimuli for single ROIs, as well as stimuli aligned to curated neural objectives. These include co-activation and decorrelation between regions, exposing cooperative and antagonistic tuning relationships. Notably, the framework captures subject-specific preferences, supporting personalized brain-driven synthesis and offering interpretable constraints for mapping, analyzing, and probing neural representations of visual information.
