The Impact of Model Zoo Size and Composition on Weight Space Learning
Damian Falk, Konstantin Schürholt, Damian Borth
TL;DR
This paper extends weight-space learning (SANE) to heterogeneous model zoos, showing that diversity in architecture and pre-training data significantly enhances zero-shot generation of unseen network weights. It introduces masked per-token loss normalization to enable stable training across inhomogeneous weight distributions, and demonstrates that multi-zoo training improves both in-distribution and out-of-distribution transfer compared to single-zoo baselines and naive weight averaging. Through extensive experiments on CNN and ResNet-18 zoos across multiple image datasets, the work shows that diversity and larger zoo size yield substantial gains, with generated weights often serving as strong initializations for downstream fine-tuning. The findings underscore the value of population diversity in weight space and provide a practical framework for generating transferable neural networks without direct data-driven weight initialization.
Abstract
Re-using trained neural network models is a common strategy to reduce training cost and transfer knowledge. Weight space learning - using the weights of trained models as data modality - is a promising new field to re-use populations of pre-trained models for future tasks. Approaches in this field have demonstrated high performance both on model analysis and weight generation tasks. However, until now their learning setup requires homogeneous model zoos where all models share the same exact architecture, limiting their capability to generalize beyond the population of models they saw during training. In this work, we remove this constraint and propose a modification to a common weight space learning method to accommodate training on heterogeneous populations of models. We further investigate the resulting impact of model diversity on generating unseen neural network model weights for zero-shot knowledge transfer. Our extensive experimental evaluation shows that including models with varying underlying image datasets has a high impact on performance and generalization, for both in- and out-of-distribution settings. Code is available on github.com/HSG-AIML/MultiZoo-SANE.
