Learning Model Representations Using Publicly Available Model Hubs
Damian Falk, Konstantin Schürholt, Konstantinos Tzevelekakis, Léo Meynent, Damian Borth
TL;DR
This work addresses the resource-intensive need for curated model zoos in weight-space learning by training a single, architecture- and dataset-agnostic weight-space backbone directly on weights from Hugging Face models, totaling 171B parameters. It introduces Masked Loss Normalization and efficient tokenization with sinusoidal positional encodings to handle heterogeneity and scale, enabling robust training on wild, uncurated model collections. Empirical results show the HF-trained backbone achieves competitive or superior performance compared to model-zoo baselines across diverse datasets and architectures, and can generalize to out-of-distribution tasks such as GPT-2 initialization. The findings demonstrate that high-quality weight-space representations can be learned in the wild, reducing the reliance on curated zoos and broadening the practical impact of weight-space learning for downstream discriminative and generative tasks.
Abstract
The weights of neural networks have emerged as a novel data modality, giving rise to the field of weight space learning. A central challenge in this area is that learning meaningful representations of weights typically requires large, carefully constructed collections of trained models, typically referred to as model zoos. These model zoos are often trained ad-hoc, requiring large computational resources, constraining the learned weight space representations in scale and flexibility. In this work, we drop this requirement by training a weight space learning backbone on arbitrary models downloaded from large, unstructured model repositories such as Hugging Face. Unlike curated model zoos, these repositories contain highly heterogeneous models: they vary in architecture and dataset, and are largely undocumented. To address the methodological challenges posed by such heterogeneity, we propose a new weight space backbone designed to handle unstructured model populations. We demonstrate that weight space representations trained on models from Hugging Face achieve strong performance, often outperforming backbones trained on laboratory-generated model zoos. Finally, we show that the diversity of the model weights in our training set allows our weight space model to generalize to unseen data modalities. By demonstrating that high-quality weight space representations can be learned in the wild, we show that curated model zoos are not indispensable, thereby overcoming a strong limitation currently faced by the weight space learning community.
