Multi-objective Differentiable Neural Architecture Search

Rhea Sanjay Sukthanker; Arber Zela; Benedikt Staffler; Samuel Dooley; Josif Grabocka; Frank Hutter

Multi-objective Differentiable Neural Architecture Search

Rhea Sanjay Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter

TL;DR

MODNAS tackles the challenge of profiling the Pareto front in hardware-aware multi-objective NAS by introducing a MetaHypernetwork that conditions architecture distributions on device features and user preference vectors. It unifies the search across multiple devices into a single gradient-based procedure using a one-shot supernet, a differentiable Architect, and a MetaPredictor to estimate hardware metrics, enabling zero-shot transfer to unseen devices. The approach scales to up to 19 devices and 3 objectives, providing representative and diverse Pareto-optimal architectures across spaces like NAS-Bench-201, MobileNetV3 (OFA), Transformers (HAT), and HW-GPT-Bench with superior hypervolume performance and reduced search costs. The work demonstrates practical impact by delivering Pareto front profiles with minimal additional search, supporting rapid adaptation to varying hardware constraints and user preferences, while acknowledging gradient-based limitations and generalization considerations.

Abstract

Pareto front profiling in multi-objective optimization (MOO), i.e., finding a diverse set of Pareto optimal solutions, is challenging, especially with expensive objectives that require training a neural network. Typically, in MOO for neural architecture search (NAS), we aim to balance performance and hardware metrics across devices. Prior NAS approaches simplify this task by incorporating hardware constraints into the objective function, but profiling the Pareto front necessitates a computationally expensive search for each constraint. In this work, we propose a novel NAS algorithm that encodes user preferences to trade-off performance and hardware metrics, yielding representative and diverse architectures across multiple devices in just a single search run. To this end, we parameterize the joint architectural distribution across devices and multiple objectives via a hypernetwork that can be conditioned on hardware features and preference vectors, enabling zero-shot transferability to new devices. Extensive experiments involving up to 19 hardware devices and 3 different objectives demonstrate the effectiveness and scalability of our method. Finally, we show that, without any additional costs, our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets, including MobileNetV3 on ImageNet-1k, an encoder-decoder transformer space for machine translation and a decoder-only space for language modelling.

Multi-objective Differentiable Neural Architecture Search

TL;DR

Abstract

Paper Structure (48 sections, 14 equations, 44 figures, 6 tables, 5 algorithms)

This paper contains 48 sections, 14 equations, 44 figures, 6 tables, 5 algorithms.

Introduction
Background and Related Work
Hardware-aware Multi-objective Differentiable Neural Architecture Search
Problem Definition & Sketch of Solution Approach
Algorithm Design and Components
Optimizing the MetaHypernetwork via MGD
Experiments
Simultaneous Pareto Set Learning across 19 devices and Ablations
Pareto Front Profiling on Transformer Space
Efficient Differentiable MOO starting from Pretrained Supernetworks
Computational Complexity
Conclusions, Broader Impact and Limitations
Appendix
Algorithmic components
Discrete Samplers
...and 33 more sections

Figures (44)

Figure 1: MODNAS overview. Given a set of $T$ devices, MODNAS seeks to optimize $M$ (potentially conflicting) objectives across these devices. To this end, it employs a MetaHypernetwork$H_\Phi (r, d_t)$, that takes as input a scalarization $r$, representing the user preferences, and a device embedding $d_t$, to yield an un-normalized architectural distribution $\Tilde{\alpha}$. The Architect uses $\Tilde{\alpha}$ to sample differentiable discrete architectures, used in the Supernetwork to estimate accuracy and in the MetaPredictor to estimate the other $M-1$ loss functions (e.g. latency, energy consumption) for every device. By iterating over devices and sampling scalarizations uniformly from the $M$-dimensional simplex, at each iteration we update the MetaHypernetwork using multiple gradient descent (MGD).
Figure 2: Architecture overview of the MetaHypernetwork, which gets as input a device embedding $d_t$ (input to an embedding layer $\mathtt{E}$) and a scalarization $\bm{r}$ (input to K hypernetworks) and yields an architecture encoding $\Tilde{\alpha}$.
Figure 3: Hypervolume (HV) of MODNAS and baselines across 19 devices on NAS-Bench-201. For every device, we optimize for 2 objectives, namely latency (ms) and test accuracy on CIFAR-10. For each method, metric and device we report the mean of 3 independent search runs. Higher area in the radar plot indicates better HV. Test devices are colored in red around the plot.
Figure 4: Illustration of MODNAS inference.
Figure 5: HV over number of evaluated architectures on NAS-Bench-201 of MODNAS and the blackbox MOO baselines on a test device. For MODNAS we only do 24 full evaluations.
...and 39 more figures

Theorems & Definitions (2)

Definition 2.1
Definition 2.2

Multi-objective Differentiable Neural Architecture Search

TL;DR

Abstract

Multi-objective Differentiable Neural Architecture Search

Authors

TL;DR

Abstract

Table of Contents

Figures (44)

Theorems & Definitions (2)