Table of Contents
Fetching ...

Sim-is-More: Randomizing HW-NAS with Synthetic Devices

Francesco Capuano, Gabriele Tiboni, Niccolò Cavagnero, Giuseppe Averta

TL;DR

This paper tackles multi-device hardware-aware NAS by eliminating reliance on pre-deployment latency models. It introduces a two-stage framework where a controller is trained on synthetic device distributions and then deployed on a real target device with only a few high-fidelity latency measurements for adaptation. The approach leverages training-free accuracy proxies and domain randomization to enable cross-device generalization while keeping test-time costs low, demonstrated on the NATS-Bench space with limited real-world probes (as few as 10). By avoiding latency predictors and LUT-based estimates, the method offers a risk-aware, scalable path for deploying latency-efficient architectures across diverse hardware platforms.

Abstract

Existing hardware-aware NAS (HW-NAS) methods typically assume access to precise information circa the target device, either via analytical approximations of the post-compilation latency model, or through learned latency predictors. Such approximate approaches risk introducing estimation errors that may prove detrimental in risk-sensitive applications. In this work, we propose a two-stage HW-NAS framework, in which we first learn an architecture controller on a distribution of synthetic devices, and then directly deploy the controller on a target device. At test-time, our network controller deploys directly to the target device without relying on any pre-collected information, and only exploits direct interactions. In particular, the pre-training phase on synthetic devices enables the controller to design an architecture for the target device by interacting with it through a small number of high-fidelity latency measurements. To guarantee accessibility of our method, we only train our controller with training-free accuracy proxies, allowing us to scale the meta-training phase without incurring the overhead of full network training. We benchmark on HW-NATS-Bench, demonstrating that our method generalizes to unseen devices and searches for latency-efficient architectures by in-context adaptation using only a few real-world latency evaluations at test-time.

Sim-is-More: Randomizing HW-NAS with Synthetic Devices

TL;DR

This paper tackles multi-device hardware-aware NAS by eliminating reliance on pre-deployment latency models. It introduces a two-stage framework where a controller is trained on synthetic device distributions and then deployed on a real target device with only a few high-fidelity latency measurements for adaptation. The approach leverages training-free accuracy proxies and domain randomization to enable cross-device generalization while keeping test-time costs low, demonstrated on the NATS-Bench space with limited real-world probes (as few as 10). By avoiding latency predictors and LUT-based estimates, the method offers a risk-aware, scalable path for deploying latency-efficient architectures across diverse hardware platforms.

Abstract

Existing hardware-aware NAS (HW-NAS) methods typically assume access to precise information circa the target device, either via analytical approximations of the post-compilation latency model, or through learned latency predictors. Such approximate approaches risk introducing estimation errors that may prove detrimental in risk-sensitive applications. In this work, we propose a two-stage HW-NAS framework, in which we first learn an architecture controller on a distribution of synthetic devices, and then directly deploy the controller on a target device. At test-time, our network controller deploys directly to the target device without relying on any pre-collected information, and only exploits direct interactions. In particular, the pre-training phase on synthetic devices enables the controller to design an architecture for the target device by interacting with it through a small number of high-fidelity latency measurements. To guarantee accessibility of our method, we only train our controller with training-free accuracy proxies, allowing us to scale the meta-training phase without incurring the overhead of full network training. We benchmark on HW-NATS-Bench, demonstrating that our method generalizes to unseen devices and searches for latency-efficient architectures by in-context adaptation using only a few real-world latency evaluations at test-time.

Paper Structure

This paper contains 19 sections, 2 equations, 6 figures.

Figures (6)

  • Figure 1: Overview our method. HW-NAS across different hardware platforms is hindered by fundamental differences across devices, influencing the performance/efficiency tradeoff differently across different devices (A). Our method consists in a two-stage process where we first learn on a distribution of synthetic devices (B, 1), and then zero-shot transfer our learned policy to mulitple devices (B, 2).
  • Figure 2: Performance/Efficiency profiles for four synthetic devices generated during training. Networks are scored using $r(h) = p_\text{FreeREA}(h) + \bar{\ell}(h)$.
  • Figure 3: Correlation between downstream validation accuracy and $p_\text{FreeREA}$ score. Highlighted points indicate $h^*$ for different devices, further underscoring how the optimal architecture in HW-NAS depends on the device considered.
  • Figure 4: Overview of the policy network for our method. At training time, the policy $\pi$ accesses (1) a candidate network, $h$ and its associated latency $\text{Latency}(h)$. The policy is then trained to propose a modification to $h$ through modifying one of the operations in one of the positions.
  • Figure 5: Average results collected over 20 test-episodes during training. (a) Average (normalized) $p_\text{FreeREA}(h_T)$ (b) Average (normalized) $\bar{\ell} (h_T)$ of the (c) Average latency percentile of $h_T$ (d) Average final validation accuracy of $h_T$ (never accessed during training) (e) Average latency of a fixed reference network $h_{\text{ref.}}$ (f) Average evolution of the cumulative reward over training.
  • ...and 1 more figures