NeuroADDA: Active Discriminative Domain Adaptation in Connectomic
Shashata Sawmya, Thomas L. Athey, Gwyneth Liu, Nir Shavit
TL;DR
This work tackles transfer learning for neuron segmentation in connectomics by using distributional distance metrics to select the most transferable pretrained source domain and then applying active learning to adapt with minimal labels. The proposed NeuroADDA framework combines optimal domain selection based on $MMD^2$ with source-free active sampling (median-uncertainty) to outperform training from scratch across six diverse EM datasets, especially at low annotation budgets. A key finding is that $MMD^2$-driven domain distances correlate with phylogenetic relationships, enabling biologically informed transfer and clustering of datasets. The approach substantially reduces annotation effort (e.g., 25–67\% VI reduction at $n=4$) and offers a practical workflow for scalable, cross-species connectomic segmentation.
Abstract
Training segmentation models from scratch has been the standard approach for new electron microscopy connectomics datasets. However, leveraging pretrained models from existing datasets could improve efficiency and performance in constrained annotation budget. In this study, we investigate domain adaptation in connectomics by analyzing six major datasets spanning different organisms. We show that, Maximum Mean Discrepancy (MMD) between neuron image distributions serves as a reliable indicator of transferability, and identifies the optimal source domain for transfer learning. Building on this, we introduce NeuroADDA, a method that combines optimal domain selection with source-free active learning to effectively adapt pretrained backbones to a new dataset. NeuroADDA consistently outperforms training from scratch across diverse datasets and fine-tuning sample sizes, with the largest gain observed at $n=4$ samples with a 25-67\% reduction in Variation of Information. Finally, we show that our analysis of distributional differences among neuron images from multiple species in a learned feature space reveals that these domain "distances" correlate with phylogenetic distance among those species.
