Table of Contents
Fetching ...

Hyper-Representations: Learning from Populations of Neural Networks

Konstantin Schürholt

TL;DR

This thesis demonstrates that hyper-representations uncover model properties, such as their performance, state of training, or hyperparameters, and presents methods that allow hyper-representations to generalize beyond model sizes, architectures, and tasks.

Abstract

This thesis addresses the challenge of understanding Neural Networks through the lens of their most fundamental component: the weights, which encapsulate the learned information and determine the model behavior. At the core of this thesis is a fundamental question: Can we learn general, task-agnostic representations from populations of Neural Network models? The key contribution of this thesis to answer that question are hyper-representations, a self-supervised method to learn representations of NN weights. Work in this thesis finds that trained NN models indeed occupy meaningful structures in the weight space, that can be learned and used. Through extensive experiments, this thesis demonstrates that hyper-representations uncover model properties, such as their performance, state of training, or hyperparameters. Moreover, the identification of regions with specific properties in hyper-representation space allows to sample and generate model weights with targeted properties. This thesis demonstrates applications for fine-tuning, and transfer learning to great success. Lastly, it presents methods that allow hyper-representations to generalize beyond model sizes, architectures, and tasks. The practical implications of that are profound, as it opens the door to foundation models of Neural Networks, which aggregate and instantiate their knowledge across models and architectures. Ultimately, this thesis contributes to the deeper understanding of Neural Networks by investigating structures in their weights which leads to more interpretable, efficient, and adaptable models. By laying the groundwork for representation learning of NN weights, this research demonstrates the potential to change the way Neural Networks are developed, analyzed, and used.

Hyper-Representations: Learning from Populations of Neural Networks

TL;DR

This thesis demonstrates that hyper-representations uncover model properties, such as their performance, state of training, or hyperparameters, and presents methods that allow hyper-representations to generalize beyond model sizes, architectures, and tasks.

Abstract

This thesis addresses the challenge of understanding Neural Networks through the lens of their most fundamental component: the weights, which encapsulate the learned information and determine the model behavior. At the core of this thesis is a fundamental question: Can we learn general, task-agnostic representations from populations of Neural Network models? The key contribution of this thesis to answer that question are hyper-representations, a self-supervised method to learn representations of NN weights. Work in this thesis finds that trained NN models indeed occupy meaningful structures in the weight space, that can be learned and used. Through extensive experiments, this thesis demonstrates that hyper-representations uncover model properties, such as their performance, state of training, or hyperparameters. Moreover, the identification of regions with specific properties in hyper-representation space allows to sample and generate model weights with targeted properties. This thesis demonstrates applications for fine-tuning, and transfer learning to great success. Lastly, it presents methods that allow hyper-representations to generalize beyond model sizes, architectures, and tasks. The practical implications of that are profound, as it opens the door to foundation models of Neural Networks, which aggregate and instantiate their knowledge across models and architectures. Ultimately, this thesis contributes to the deeper understanding of Neural Networks by investigating structures in their weights which leads to more interpretable, efficient, and adaptable models. By laying the groundwork for representation learning of NN weights, this research demonstrates the potential to change the way Neural Networks are developed, analyzed, and used.
Paper Structure (89 sections, 28 equations, 53 figures, 48 tables, 3 algorithms)

This paper contains 89 sections, 28 equations, 53 figures, 48 tables, 3 algorithms.

Figures (53)

  • Figure 1.0.1: Overview of the landscape spanned by applications for and perspectives on Neural Network models. Due to the rich information and application potential, the focus of this thesis lies on NN weights for model analysis and generation.
  • Figure 1.0.2: Overview of the contribution of the thesis. (a): Chapter \ref{['chap::weight_space']} investigates local and global structure in weight spaces as well the potential and challenges of operations on NN weights. (b): Chapter \ref{['chap::model_zoos']} proposes a blueprint for diverse populations of NNs as a dataset for the work in this thesis. (c): Chapter \ref{['chap::hyper_reps']} introduces hyper-representations as a self-supervised representation learning method on NN weights, as well as NN weight augmentation methods. (d): Chapter \ref{['chap::generative_hyper_reps']} extends hyper-representations for model generation. (e): Chapter \ref{['chap::scalable_hyper_reps']} proposes methods to scale hyper-representations to large models and diverse architectures.
  • Figure 2.3.1: Weight entropy over training epochs for populations of CNNs (with $\sim 12k$ parameters), AlexNet and ResNet-18 models trained on CIFAR10. Weight entropy is approximated via the empirical spectral density of the weight matrices. During training, the entropy of weight matrices decreases as order is induced in the weights.
  • Figure 2.6.1: Absolute correlation coefficient between pairwise CKA and $l_2$ similarity scores over number of permutations of 15 models. Increasing the number of permutations increases the number of pairs and the global coverage of the weight space.
  • Figure 2.6.2: Correlation between pairwise CKA and $cos$ similarity over the number of models. Increasing the number of models increases the global coverage of the weight space.
  • ...and 48 more figures