Table of Contents
Fetching ...

The Impact of Model Zoo Size and Composition on Weight Space Learning

Damian Falk, Konstantin Schürholt, Damian Borth

TL;DR

This paper extends weight-space learning (SANE) to heterogeneous model zoos, showing that diversity in architecture and pre-training data significantly enhances zero-shot generation of unseen network weights. It introduces masked per-token loss normalization to enable stable training across inhomogeneous weight distributions, and demonstrates that multi-zoo training improves both in-distribution and out-of-distribution transfer compared to single-zoo baselines and naive weight averaging. Through extensive experiments on CNN and ResNet-18 zoos across multiple image datasets, the work shows that diversity and larger zoo size yield substantial gains, with generated weights often serving as strong initializations for downstream fine-tuning. The findings underscore the value of population diversity in weight space and provide a practical framework for generating transferable neural networks without direct data-driven weight initialization.

Abstract

Re-using trained neural network models is a common strategy to reduce training cost and transfer knowledge. Weight space learning - using the weights of trained models as data modality - is a promising new field to re-use populations of pre-trained models for future tasks. Approaches in this field have demonstrated high performance both on model analysis and weight generation tasks. However, until now their learning setup requires homogeneous model zoos where all models share the same exact architecture, limiting their capability to generalize beyond the population of models they saw during training. In this work, we remove this constraint and propose a modification to a common weight space learning method to accommodate training on heterogeneous populations of models. We further investigate the resulting impact of model diversity on generating unseen neural network model weights for zero-shot knowledge transfer. Our extensive experimental evaluation shows that including models with varying underlying image datasets has a high impact on performance and generalization, for both in- and out-of-distribution settings. Code is available on github.com/HSG-AIML/MultiZoo-SANE.

The Impact of Model Zoo Size and Composition on Weight Space Learning

TL;DR

This paper extends weight-space learning (SANE) to heterogeneous model zoos, showing that diversity in architecture and pre-training data significantly enhances zero-shot generation of unseen network weights. It introduces masked per-token loss normalization to enable stable training across inhomogeneous weight distributions, and demonstrates that multi-zoo training improves both in-distribution and out-of-distribution transfer compared to single-zoo baselines and naive weight averaging. Through extensive experiments on CNN and ResNet-18 zoos across multiple image datasets, the work shows that diversity and larger zoo size yield substantial gains, with generated weights often serving as strong initializations for downstream fine-tuning. The findings underscore the value of population diversity in weight space and provide a practical framework for generating transferable neural networks without direct data-driven weight initialization.

Abstract

Re-using trained neural network models is a common strategy to reduce training cost and transfer knowledge. Weight space learning - using the weights of trained models as data modality - is a promising new field to re-use populations of pre-trained models for future tasks. Approaches in this field have demonstrated high performance both on model analysis and weight generation tasks. However, until now their learning setup requires homogeneous model zoos where all models share the same exact architecture, limiting their capability to generalize beyond the population of models they saw during training. In this work, we remove this constraint and propose a modification to a common weight space learning method to accommodate training on heterogeneous populations of models. We further investigate the resulting impact of model diversity on generating unseen neural network model weights for zero-shot knowledge transfer. Our extensive experimental evaluation shows that including models with varying underlying image datasets has a high impact on performance and generalization, for both in- and out-of-distribution settings. Code is available on github.com/HSG-AIML/MultiZoo-SANE.

Paper Structure

This paper contains 22 sections, 3 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Test accuracy of model soups over a number of averaged models. Increasing the number of models, aligned or not aligned, decreases performance.
  • Figure 2: Comparison of weight distributions of a selection of ResNet layers between original weights (blue/left) vs reconstructed weights (right). We compare reconstruction without normalization (orange), with per-token normalization (green) and with masked per-token normalization (red). As in previous work, without normalization, weights of layers with narrow distributions are squashed towards the mean. Normalizing per-token fixes that issue. Ignoring the mask introduces a strong bias, particularly for batch-norm layers. Reconstructions with the masked per-token normalization match the original the closest. On the right we show the mean$\pm$std performance of 10 sampled ResNet-18 models on CIFAR100 with the different normalizations.
  • Figure 3: Comparison of 0-shot performance of sampled models on the downstream image datasets when varying model zoo composition and sample size. SANE is trained with [50/100/150/200/300] samples (2.5 - 15M weight tokens) for 60 epochs using data taken from one to three model zoos.
  • Figure 4: Comparison of SANE to HyperNetworks and random initialization during fine-tuning.