Table of Contents
Fetching ...

SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

Zitong Huang, Mansooreh Montazerin, Ajitesh Srivastava

TL;DR

SWAT-NN addresses the challenge of jointly optimizing neural network architecture and weights by learning a universal, functionally meaningful latent embedding of networks and performing gradient-based optimization directly in that space. A multi-scale autoencoder encodes both structure and parameters into a latent vector, from which multiple decoders generate MLPs of varying depths; optimization then selects networks tailored to a dataset, with sparsity and compactness penalties guiding pruning. Across 54 CORNN regression datasets, SWAT-NN yields sparser, more compact models with comparable or superior accuracy compared to NAS baselines like DARTS+ADMM and fixed-activation autoencoders, demonstrating improved efficiency and generality. The approach advances neural architecture search by treating networks as holistic function approximators in a continuous space, enabling rapid, gradient-driven discovery of hardware-friendly models without task-specific predictors.

Abstract

Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.

SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space

TL;DR

SWAT-NN addresses the challenge of jointly optimizing neural network architecture and weights by learning a universal, functionally meaningful latent embedding of networks and performing gradient-based optimization directly in that space. A multi-scale autoencoder encodes both structure and parameters into a latent vector, from which multiple decoders generate MLPs of varying depths; optimization then selects networks tailored to a dataset, with sparsity and compactness penalties guiding pruning. Across 54 CORNN regression datasets, SWAT-NN yields sparser, more compact models with comparable or superior accuracy compared to NAS baselines like DARTS+ADMM and fixed-activation autoencoders, demonstrating improved efficiency and generality. The approach advances neural architecture search by treating networks as holistic function approximators in a continuous space, enabling rapid, gradient-driven discovery of hardware-friendly models without task-specific predictors.

Abstract

Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.

Paper Structure

This paper contains 28 sections, 13 equations, 15 figures, 1 table.

Figures (15)

  • Figure 1: An illustration of the overall idea for SWAT-NN
  • Figure 2: Left: Multi-scale autoencoder architecture with four decoders, each corresponding to MLPs with 1–4 hidden layers. Right: Pipeline for training optimal MLPs through gradient-based search in the continuous latent space learned by the autoencoder.
  • Figure 3: Matrix representation for a 2 hidden-layer MLP with 3 neurons per layer. Different colors indicate elements associated with neurons from different layers, encoding the outgoing weights, activation functions, and bias terms of each neuron. Entries marked with 'X' represent zero-padding.
  • Figure 4: MSE vs non-zero weights between DARTS+ADMM and SWAT-NN across 20 randomly selected datasets. Each point represents a model configuration, plotted by its test MSE and number of non-zero weights.
  • Figure 5: Top: Test MSE of the best-performing models identified by DARTS+ADMM and SWAT-NN across all 54 datasets. Bottom: Corresponding number of non-zero weights. Function labels are color-coded to indicate which method yields a more sparse and compact model.
  • ...and 10 more figures