Neural Coherence : Find higher performance to out-of-distribution tasks from few samples
Simon Guiroy, Mats Richter, Sarath Chandar, Christopher Pal
TL;DR
Neural Coherence introduces a data-efficient, activation-trajectory-based framework for model and data selection under distribution shift. By tracking multi-layer activation statistics across training hyperparameters and contrasting source and target trajectories, it identifies optimal checkpoints and training data with only a few unlabeled target samples. The approach is instantiated for checkpoint selection and data selection, and validated across meta-learning, zero-shot generalization, and transfer learning on large vision models, showing substantial improvements over traditional validation-based methods and several baselines. The work highlights activation dynamics as a rich signal for generalization under domain shift and offers a practical, architecture-agnostic criterion for robust pre-training and fine-tuning decisions.
Abstract
To create state-of-the-art models for many downstream tasks, it has become common practice to fine-tune a pre-trained large vision model. However, it remains an open question of how to best determine which of the many possible model checkpoints resulting from a large training run to use as the starting point. This becomes especially important when data for the target task of interest is scarce, unlabeled and out-of-distribution. In such scenarios, common methods relying on in-distribution validation data become unreliable or inapplicable. This work proposes a novel approach for model selection that operates reliably on just a few unlabeled examples from the target task. Our approach is based on a novel concept: Neural Coherence, which entails characterizing a model's activation statistics for source and target domains, allowing one to define model selection methods with high data-efficiency. We provide experiments where models are pre-trained on ImageNet1K and examine target domains consisting of Food-101, PlantNet-300K and iNaturalist. We also evaluate it in many meta-learning settings. Our approach significantly improves generalization across these different target domains compared to established baselines. We further demonstrate the versatility of Neural Coherence as a powerful principle by showing its effectiveness in training data selection.
