Table of Contents
Fetching ...

Set-based Neural Network Encoding Without Weight Tying

Bruno Andreis, Soro Bedionita, Philip H. S. Torr, Sung Ju Hwang

TL;DR

This work tackles predicting neural network properties from trained parameters across architectures. It introduces the Set-based Neural Network Encoder (SNE), which encodes networks of arbitrary architecture by chunking weights into sets, applying set-to-set and set-to-vector operations, and aggregating layer encodings into a fixed-size network representation $z_{x_i}\in\mathbb{R}^h$ that feeds a property predictor. A key novelty is learning minimal weight-space equivariance without weight tying via Logit Invariance Regularization, enabling a single encoder to handle diverse architectures. The authors validate SNE on implicit neural representations and CNN/Transformer model zoos, demonstrating superior cross-dataset and cross-architecture transfer, scalability, and data efficiency, with strong quantitative gains over baselines.

Abstract

We propose a neural network weight encoding method for network property prediction that utilizes set-to-set and set-to-vector functions to efficiently encode neural network parameters. Our approach is capable of encoding neural networks in a model zoo of mixed architecture and different parameter sizes as opposed to previous approaches that require custom encoding models for different architectures. Furthermore, our \textbf{S}et-based \textbf{N}eural network \textbf{E}ncoder (SNE) takes into consideration the hierarchical computational structure of neural networks. To respect symmetries inherent in network weight space, we utilize Logit Invariance to learn the required minimal invariance properties. Additionally, we introduce a \textit{pad-chunk-encode} pipeline to efficiently encode neural network layers that is adjustable to computational and memory constraints. We also introduce two new tasks for neural network property prediction: cross-dataset and cross-architecture. In cross-dataset property prediction, we evaluate how well property predictors generalize across model zoos trained on different datasets but of the same architecture. In cross-architecture property prediction, we evaluate how well property predictors transfer to model zoos of different architecture not seen during training. We show that SNE outperforms the relevant baselines on standard benchmarks.

Set-based Neural Network Encoding Without Weight Tying

TL;DR

This work tackles predicting neural network properties from trained parameters across architectures. It introduces the Set-based Neural Network Encoder (SNE), which encodes networks of arbitrary architecture by chunking weights into sets, applying set-to-set and set-to-vector operations, and aggregating layer encodings into a fixed-size network representation that feeds a property predictor. A key novelty is learning minimal weight-space equivariance without weight tying via Logit Invariance Regularization, enabling a single encoder to handle diverse architectures. The authors validate SNE on implicit neural representations and CNN/Transformer model zoos, demonstrating superior cross-dataset and cross-architecture transfer, scalability, and data efficiency, with strong quantitative gains over baselines.

Abstract

We propose a neural network weight encoding method for network property prediction that utilizes set-to-set and set-to-vector functions to efficiently encode neural network parameters. Our approach is capable of encoding neural networks in a model zoo of mixed architecture and different parameter sizes as opposed to previous approaches that require custom encoding models for different architectures. Furthermore, our \textbf{S}et-based \textbf{N}eural network \textbf{E}ncoder (SNE) takes into consideration the hierarchical computational structure of neural networks. To respect symmetries inherent in network weight space, we utilize Logit Invariance to learn the required minimal invariance properties. Additionally, we introduce a \textit{pad-chunk-encode} pipeline to efficiently encode neural network layers that is adjustable to computational and memory constraints. We also introduce two new tasks for neural network property prediction: cross-dataset and cross-architecture. In cross-dataset property prediction, we evaluate how well property predictors generalize across model zoos trained on different datasets but of the same architecture. In cross-architecture property prediction, we evaluate how well property predictors transfer to model zoos of different architecture not seen during training. We show that SNE outperforms the relevant baselines on standard benchmarks.
Paper Structure (33 sections, 1 theorem, 16 equations, 11 figures, 16 tables)

This paper contains 33 sections, 1 theorem, 16 equations, 11 figures, 16 tables.

Key Result

Proposition C.1

genuineinvariance Logit invariance error minimization implies $\sigma_{max}(W(t)) \leq \sigma(W(0))$ when $t\rightarrow \infty$.

Figures (11)

  • Figure 1: Legend:$\vcenter{}$: Padding, $\vcenter{}$: Set-to-Set Function, $\vcenter{}$: Set-to-Vector Function, $\vcenter{}$: Layer-Level & $\vcenter{}$: Layer-Type Encoder. Concept:(left) Given layer weights, SNE begins by padding and chunking the weights into chunksizes. Each chunk goes through a series of set-to-set and set-to-vector functions to obtain the chunk representation vector. Layer level and type positional encodings are used to inject structural information of the network at each stage of the chunk encoding process. All chunk encoding vectors are encoded together to obtain the layer encoding. (right) All layer encodings in the neural network are encoded to obtain the neural network encoding vector again using as series of set-to-set and set-to-vector functions. This vector is then used to predict the neural network property of interest.
  • Figure 1: Predicting Frequencies of Implicit Neural Representations (INRs).
  • Figure 2: Cross-Architecture Performance Prediction.
  • Figure 2: TSNE Visualization of Neural Network Encodings. We train neural network performance prediction methods on a combination of the MNIST, FashionMNIST, CIFAR10 and SVHN modelzoos of statnn. We present 3 views of the resulting 3-D plots showing how neural networks from each modelzoo are embedded/encoded by the corresponding models. Larger versions of these figures are provided in Appendix \ref{['app:misc']}. Zoom in for better viewing.
  • Figure 3: Cross-Architecture Performance Prediction on hypermodelzoo's model zoo.
  • ...and 6 more figures

Theorems & Definitions (2)

  • Definition C.1: Logit Invariance
  • Proposition C.1: Invariance-Induced by Spectra Decay