Synthetic Data Generation for Classifying Electrophysiological and Morpho-Electrophysiological Neurons from Mouse Visual Cortex
Xavier Vasques, Laura Cif
TL;DR
This study benchmarks classical and deep generative augmentation methods for classifying Allen electrophysiology-defined e-types in the mouse visual cortex, comparing E→e-type and M+E→e-type tasks. It finds that SMOTE offers the most robust gains, especially when augmentation is applied in the native high-dimensional feature space, with hold-out accuracy rising to roughly 0.72–0.76 for E→e-type and 0.85–0.90 for M+E→e-type; deep generative models provide moderate, context-dependent improvements. A biologically anchored fidelity framework using KS tests, MAE, Euclidean distances, and a Mann–Whitney variability baseline shows SMOTE-generated samples reside within biologically plausible diversity, highlighting persistent challenges for rare/inhibitory subclasses. The results give practical guidance for scalable neuron-type classification and point to future work on reduction-aware generative models and targeted data collection to improve fidelity for hard cases. Overall, the work supports synthetic augmentation as a complementary tool to multimodal neuronal mapping, enabling more robust classification while preserving biological interpretability.
Abstract
The accurate classification of neuronal cell types is central to decoding brain function, yet remains hindered by data scarcity and cellular heterogeneity. Here, we benchmarked classical and deep generative synthetic data augmentation strategies -- including SMOTE, GANs, VAEs, Normalizing Flows, and DDPMs -- for supervised classification of both electrophysiological (e-type) and morpho-electrophysiological (mee-type) neuron types from the mouse visual cortex. Using a curated dataset annotated with 48 electrophysiological and 24 morphological features, we established baseline classifiers and introduced synthetic data generated by each method. Our results demonstrate that SMOTE-based augmentation yields the highest classification accuracies (absolute gains of 0.16 for e-types, 0.12 for mee-types), outperforming deep generative models. GANs approached similar performance when hyperparameters and sample sizes were optimized, but were more sensitive to model specification. In addition, we benchmarked synthetic neuron fidelity by comparing mean absolute errors between synthetic and real class profiles against the natural phenotypic variability observed between real neuronal classes.
