neuro2voc: Decoding Vocalizations from Neural Activity
Fei Gao
TL;DR
This study addresses decoding vocalizations from high-resolution neural recordings in zebra finches, tackling the sparsity and long-range dependencies of spike trains. It evaluates a spectrum of methods from classical baselines (SVM, RF, XGBoost) with SHAP analysis to state-space models (Mamba, Mamba-2), token-based NLP approaches (GPT-2), temporal deep nets (EEGNet, LSTM variants), and modern representational learning (CEBRA, VAEs) for syllable and spectrogram reconstruction. The work demonstrates that population-level features (e.g., total spike counts, mean firing rates, TF-IDF-like representations) can achieve strong syllable classification, and that contrastive learning with VAEs enables cross-modal generation of vocal data from neural inputs, with canonical correlations between warped neural and vocal latent spaces. It also reveals limitations such as data volume requirements, generalization across individuals, and the need for motor-pathway data to improve biological plausibility. Overall, neuro2voc establishes a multi-faceted framework for neural decoding and cross-modal generation, offering actionable pathways for NeuroAI-inspired decoding of complex motor outputs.
Abstract
Accurate decoding of neural spike trains and relating them to motor output is a challenging task due to the inherent sparsity and length in neural spikes and the complexity of brain circuits. This master project investigates experimental methods for decoding zebra finch motor outputs (in both discrete syllables and continuous spectrograms), from invasive neural recordings obtained from Neuropixels. There are three major achievements: (1) XGBoost with SHAP analysis trained on spike rates revealed neuronal interaction patterns crucial for syllable classification. (2) Novel method (tokenizing neural data with GPT2) and architecture (Mamba2) demonstrated potential for decoding of syllables using spikes. (3) A combined contrastive learning-VAE framework successfully generated spectrograms from binned neural data. This work establishes a promising foundation for neural decoding of complex motor outputs and offers several novel methodological approaches for processing sparse neural data.
