SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space
Zitong Huang, Mansooreh Montazerin, Ajitesh Srivastava
TL;DR
SWAT-NN addresses the challenge of jointly optimizing neural network architecture and weights by learning a universal, functionally meaningful latent embedding of networks and performing gradient-based optimization directly in that space. A multi-scale autoencoder encodes both structure and parameters into a latent vector, from which multiple decoders generate MLPs of varying depths; optimization then selects networks tailored to a dataset, with sparsity and compactness penalties guiding pruning. Across 54 CORNN regression datasets, SWAT-NN yields sparser, more compact models with comparable or superior accuracy compared to NAS baselines like DARTS+ADMM and fixed-activation autoencoders, demonstrating improved efficiency and generality. The approach advances neural architecture search by treating networks as holistic function approximators in a continuous space, enabling rapid, gradient-driven discovery of hardware-friendly models without task-specific predictors.
Abstract
Designing neural networks typically relies on manual trial and error or a neural architecture search (NAS) followed by weight training. The former is time-consuming and labor-intensive, while the latter often discretizes architecture search and weight optimization. In this paper, we propose a fundamentally different approach that simultaneously optimizes both the architecture and the weights of a neural network. Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space, where functionally similar neural networks are mapped closer together. Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network, jointly optimizing its structure and weights. The optimization process incorporates sparsity and compactness penalties to promote efficient models. Experiments on synthetic regression tasks demonstrate that our method effectively discovers sparse and compact neural networks with strong performance.
