Multi-objective Differentiable Neural Architecture Search
Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter
TL;DR
MODNAS tackles the challenge of profiling the Pareto front in hardware-aware multi-objective NAS by introducing a MetaHypernetwork that conditions architecture distributions on device features and user preference vectors. It unifies the search across multiple devices into a single gradient-based procedure using a one-shot supernet, a differentiable Architect, and a MetaPredictor to estimate hardware metrics, enabling zero-shot transfer to unseen devices. The approach scales to up to 19 devices and 3 objectives, providing representative and diverse Pareto-optimal architectures across spaces like NAS-Bench-201, MobileNetV3 (OFA), Transformers (HAT), and HW-GPT-Bench with superior hypervolume performance and reduced search costs. The work demonstrates practical impact by delivering Pareto front profiles with minimal additional search, supporting rapid adaptation to varying hardware constraints and user preferences, while acknowledging gradient-based limitations and generalization considerations.
Abstract
Pareto front profiling in multi-objective optimization (MOO), i.e., finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives that require training a neural network. Typically, in MOO for neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a computationally expensive search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences to trade-off performance and hardware metrics, yielding representative and diverse architectures across multiple devices in just a single search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that can be conditioned on hardware features and preference vectors, enabling zero-shot transferability to new devices. Extensive experiments involving up to 19 hardware devices and 3 different objectives demonstrate the effectiveness and scalability of our method. Finally, we show that, without any additional costs, our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets, including MobileNetV3 on ImageNet-1k, an encoder-decoder transformer space for machine translation and a decoder-only space for language modelling.
