Weight-Entanglement Meets Gradient-Based Neural Architecture Search
Rhea Sanjay Sukthanker, Arjun Krishnakumar, Mahmoud Safari, Frank Hutter
TL;DR
This work bridges weight-sharing gradient-based NAS and weight-entangled macro spaces by introducing TangleNAS, a scheme that employs weight-superposition and combi-superposition to enable single-stage NAS in WE spaces. It reduces memory and forward-pass cost while preserving the expressiveness of macro-level architectural choices, and demonstrates strong performance across toy, cell-based, and macro spaces—including AutoFormer, MobileNetV3, and language-model search spaces—often surpassing two-stage baselines. The results show competitive or superior accuracy, improved anytime performance, and meaningful reductions in memory usage, with detailed analyses of architecture design choices, pretraining/fine-tuning/retraining effects, and transfer to ImageNet. This approach advances practical NAS by enabling efficient exploration of broad architectural spaces, potentially accelerating the design of scalable transformers and other large models.
Abstract
Weight sharing is a fundamental concept in neural architecture search (NAS), enabling gradient-based methods to explore cell-based architectural spaces significantly faster than traditional black-box approaches. In parallel, weight-entanglement has emerged as a technique for more intricate parameter sharing amongst macro-architectural spaces. Since weight-entanglement is not directly compatible with gradient-based NAS methods, these two paradigms have largely developed independently in parallel sub-communities. This paper aims to bridge the gap between these sub-communities by proposing a novel scheme to adapt gradient-based methods for weight-entangled spaces. This enables us to conduct an in-depth comparative assessment and analysis of the performance of gradient-based NAS in weight-entangled search spaces. Our findings reveal that this integration of weight-entanglement and gradient-based NAS brings forth the various benefits of gradient-based methods, while preserving the memory efficiency of weight-entangled spaces. The code for our work is openly accessible https://github.com/automl/TangleNAS.
