Table of Contents
Fetching ...

Learning Model Representations Using Publicly Available Model Hubs

Damian Falk, Konstantin Schürholt, Konstantinos Tzevelekakis, Léo Meynent, Damian Borth

TL;DR

This work addresses the resource-intensive need for curated model zoos in weight-space learning by training a single, architecture- and dataset-agnostic weight-space backbone directly on weights from Hugging Face models, totaling 171B parameters. It introduces Masked Loss Normalization and efficient tokenization with sinusoidal positional encodings to handle heterogeneity and scale, enabling robust training on wild, uncurated model collections. Empirical results show the HF-trained backbone achieves competitive or superior performance compared to model-zoo baselines across diverse datasets and architectures, and can generalize to out-of-distribution tasks such as GPT-2 initialization. The findings demonstrate that high-quality weight-space representations can be learned in the wild, reducing the reliance on curated zoos and broadening the practical impact of weight-space learning for downstream discriminative and generative tasks.

Abstract

The weights of neural networks have emerged as a novel data modality, giving rise to the field of weight space learning. A central challenge in this area is that learning meaningful representations of weights typically requires large, carefully constructed collections of trained models, typically referred to as model zoos. These model zoos are often trained ad-hoc, requiring large computational resources, constraining the learned weight space representations in scale and flexibility. In this work, we drop this requirement by training a weight space learning backbone on arbitrary models downloaded from large, unstructured model repositories such as Hugging Face. Unlike curated model zoos, these repositories contain highly heterogeneous models: they vary in architecture and dataset, and are largely undocumented. To address the methodological challenges posed by such heterogeneity, we propose a new weight space backbone designed to handle unstructured model populations. We demonstrate that weight space representations trained on models from Hugging Face achieve strong performance, often outperforming backbones trained on laboratory-generated model zoos. Finally, we show that the diversity of the model weights in our training set allows our weight space model to generalize to unseen data modalities. By demonstrating that high-quality weight space representations can be learned in the wild, we show that curated model zoos are not indispensable, thereby overcoming a strong limitation currently faced by the weight space learning community.

Learning Model Representations Using Publicly Available Model Hubs

TL;DR

This work addresses the resource-intensive need for curated model zoos in weight-space learning by training a single, architecture- and dataset-agnostic weight-space backbone directly on weights from Hugging Face models, totaling 171B parameters. It introduces Masked Loss Normalization and efficient tokenization with sinusoidal positional encodings to handle heterogeneity and scale, enabling robust training on wild, uncurated model collections. Empirical results show the HF-trained backbone achieves competitive or superior performance compared to model-zoo baselines across diverse datasets and architectures, and can generalize to out-of-distribution tasks such as GPT-2 initialization. The findings demonstrate that high-quality weight-space representations can be learned in the wild, reducing the reliance on curated zoos and broadening the practical impact of weight-space learning for downstream discriminative and generative tasks.

Abstract

The weights of neural networks have emerged as a novel data modality, giving rise to the field of weight space learning. A central challenge in this area is that learning meaningful representations of weights typically requires large, carefully constructed collections of trained models, typically referred to as model zoos. These model zoos are often trained ad-hoc, requiring large computational resources, constraining the learned weight space representations in scale and flexibility. In this work, we drop this requirement by training a weight space learning backbone on arbitrary models downloaded from large, unstructured model repositories such as Hugging Face. Unlike curated model zoos, these repositories contain highly heterogeneous models: they vary in architecture and dataset, and are largely undocumented. To address the methodological challenges posed by such heterogeneity, we propose a new weight space backbone designed to handle unstructured model populations. We demonstrate that weight space representations trained on models from Hugging Face achieve strong performance, often outperforming backbones trained on laboratory-generated model zoos. Finally, we show that the diversity of the model weights in our training set allows our weight space model to generalize to unseen data modalities. By demonstrating that high-quality weight space representations can be learned in the wild, we show that curated model zoos are not indispensable, thereby overcoming a strong limitation currently faced by the weight space learning community.

Paper Structure

This paper contains 50 sections, 1 equation, 8 figures, 8 tables.

Figures (8)

  • Figure 1: An overview of the proposed method. We train a weight space representation directly from the weights of downloaded models from Hugging Face. These models are, to a large extent, undocumented, trained on various datasets, and composed from different neural network architectures. Once a representation is learned from such a heterogeneous model collection, it can be exploited for multiple downstream tasks: either analyzing or generating model weights for multiple architectures and target datasets. Please note, all this is accomplished using the same single representation trained entirely from HF models.
  • Figure 2: Accuracy of generated ResNet-18 models on the respective target image datasets. No trainable parameters are updated before the performance evaluation. We compare training on homogeneous model zoos using SANE with MLN, to training the backbone on HF models. HF (S) and HF (L) designate the small and large versions of the backbone, respectively. With the exception of CIFAR-10, our approach outperforms model zoo training for all datasets.
  • Figure 3: Accuracy of generated models on ImageNet-1K after a few epochs of finetuning. We show the performance of three different architecture types as well as the overall mean$\pm$std accuracy over all generated models comparing to training from scratch.
  • Figure 4: Comparing a generated GPT-2 model on OpenWebText to training from scratch. Results show the performance on the minival split and indicate that our backbone can generalize to a different modality showing improved performance over standard weight init.
  • Figure 11: Performance of generated ResNet-18 models with varying backbones. Here we include the performance of both baselines separately whereas in the paper we show the max performance achieved over both baselines. Furthermore we show the performance of both the small and large HF backbone. The results indicate that training on HF models is feasible and outperforms the baselines with the exception of CIFAR10.
  • ...and 3 more figures