Table of Contents
Fetching ...

NeuroADDA: Active Discriminative Domain Adaptation in Connectomic

Shashata Sawmya, Thomas L. Athey, Gwyneth Liu, Nir Shavit

TL;DR

This work tackles transfer learning for neuron segmentation in connectomics by using distributional distance metrics to select the most transferable pretrained source domain and then applying active learning to adapt with minimal labels. The proposed NeuroADDA framework combines optimal domain selection based on $MMD^2$ with source-free active sampling (median-uncertainty) to outperform training from scratch across six diverse EM datasets, especially at low annotation budgets. A key finding is that $MMD^2$-driven domain distances correlate with phylogenetic relationships, enabling biologically informed transfer and clustering of datasets. The approach substantially reduces annotation effort (e.g., 25–67\% VI reduction at $n=4$) and offers a practical workflow for scalable, cross-species connectomic segmentation.

Abstract

Training segmentation models from scratch has been the standard approach for new electron microscopy connectomics datasets. However, leveraging pretrained models from existing datasets could improve efficiency and performance in constrained annotation budget. In this study, we investigate domain adaptation in connectomics by analyzing six major datasets spanning different organisms. We show that, Maximum Mean Discrepancy (MMD) between neuron image distributions serves as a reliable indicator of transferability, and identifies the optimal source domain for transfer learning. Building on this, we introduce NeuroADDA, a method that combines optimal domain selection with source-free active learning to effectively adapt pretrained backbones to a new dataset. NeuroADDA consistently outperforms training from scratch across diverse datasets and fine-tuning sample sizes, with the largest gain observed at $n=4$ samples with a 25-67\% reduction in Variation of Information. Finally, we show that our analysis of distributional differences among neuron images from multiple species in a learned feature space reveals that these domain "distances" correlate with phylogenetic distance among those species.

NeuroADDA: Active Discriminative Domain Adaptation in Connectomic

TL;DR

This work tackles transfer learning for neuron segmentation in connectomics by using distributional distance metrics to select the most transferable pretrained source domain and then applying active learning to adapt with minimal labels. The proposed NeuroADDA framework combines optimal domain selection based on with source-free active sampling (median-uncertainty) to outperform training from scratch across six diverse EM datasets, especially at low annotation budgets. A key finding is that -driven domain distances correlate with phylogenetic relationships, enabling biologically informed transfer and clustering of datasets. The approach substantially reduces annotation effort (e.g., 25–67\% VI reduction at ) and offers a practical workflow for scalable, cross-species connectomic segmentation.

Abstract

Training segmentation models from scratch has been the standard approach for new electron microscopy connectomics datasets. However, leveraging pretrained models from existing datasets could improve efficiency and performance in constrained annotation budget. In this study, we investigate domain adaptation in connectomics by analyzing six major datasets spanning different organisms. We show that, Maximum Mean Discrepancy (MMD) between neuron image distributions serves as a reliable indicator of transferability, and identifies the optimal source domain for transfer learning. Building on this, we introduce NeuroADDA, a method that combines optimal domain selection with source-free active learning to effectively adapt pretrained backbones to a new dataset. NeuroADDA consistently outperforms training from scratch across diverse datasets and fine-tuning sample sizes, with the largest gain observed at samples with a 25-67\% reduction in Variation of Information. Finally, we show that our analysis of distributional differences among neuron images from multiple species in a learned feature space reveals that these domain "distances" correlate with phylogenetic distance among those species.

Paper Structure

This paper contains 11 sections, 4 equations, 3 figures, 3 tables, 1 algorithm.

Figures (3)

  • Figure 1: (A) Capturing the distribution shift between source and target images using source pretrained UNET Encoder. The distance is measured using squared Maximum Mean Discrepancy (B) The $MMD^2$ distance matrix illustrates the pairwise distribution shift between different datasets, where higher values indicate greater domain discrepancy. (C) Mean Variation of Information comparison (lower is better) between NeuroADDA (green) and scratch training (blue) across six datasets, showing that NeuroADDA consistently achieve better performance and lower variance over scratch training in lower training sample size.
  • Figure 2: Examples of the most uncertain images under each source-to-target dataset transfer, accompanied by their corresponding uncertainty scores. In all transfer scenarios, high-uncertainty samples tend to be either image artifacts (e.g., severe brightness or contrast abnormalities) or biologically atypical regions (e.g., unusual structural patterns), indicating that such outliers pose the greatest challenge for model adaptation across datasets.
  • Figure 3: (A) Symmetrized MMD distance matrix and agglomerative clustering dendrogram across six major connectomics datasets, with darker squares representing greater domain distance. (B) Comparison between the resulting MMD‐based clustering and known phylogenetic relationships, highlighting that the two clusterings align significantly at the phylum level.