Table of Contents
Fetching ...

Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML

Mark Deutel, Georgios Kontes, Christopher Mutschler, Jürgen Teich

TL;DR

This work addresses deploying DNNs on resource-constrained microcontrollers by jointly optimizing accuracy, memory (ROM/RAM), and computational cost (FLOPs) through a novel combination of multi-objective Bayesian optimization and Augmented Random Search (ARS) reinforcement learning. It formulates architecture search as hyperparameter optimization over pruning and quantization, using an ensemble of ARS-driven policies to efficiently sample promising hyperparameter configurations within a tight evaluation budget. The approach yields a Pareto front of deployable TinyML models that surpass multiple MOOpt baselines (e.g., ParEGO, TurBO, MorBO) across CIFAR-10 and DaLiAc tasks, with robust performance under realistic memory constraints. Practically, ARS-MOBOpt enables direct deployment on common microcontrollers and is open-sourced for reproducibility and further development.

Abstract

Deploying deep neural networks (DNNs) on microcontrollers (TinyML) is a common trend to process the increasing amount of sensor data generated at the edge, but in practice, resource and latency constraints make it difficult to find optimal DNN candidates. Neural architecture search (NAS) is an excellent approach to automate this search and can easily be combined with DNN compression techniques commonly used in TinyML. However, many NAS techniques are not only computationally expensive, especially hyperparameter optimization (HPO), but also often focus on optimizing only a single objective, e.g., maximizing accuracy, without considering additional objectives such as memory requirements or computational complexity of a DNN, which are key to making deployment at the edge feasible. In this paper, we propose a novel NAS strategy for TinyML based on multi-objective Bayesian optimization (MOBOpt) and an ensemble of competing parametric policies trained using Augmented Random Search (ARS) reinforcement learning (RL) agents. Our methodology aims at efficiently finding tradeoffs between a DNN's predictive accuracy, memory requirements on a given target system, and computational complexity. Our experiments show that we consistently outperform existing MOBOpt approaches on different datasets and architectures such as ResNet-18 and MobileNetV3.

Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML

TL;DR

This work addresses deploying DNNs on resource-constrained microcontrollers by jointly optimizing accuracy, memory (ROM/RAM), and computational cost (FLOPs) through a novel combination of multi-objective Bayesian optimization and Augmented Random Search (ARS) reinforcement learning. It formulates architecture search as hyperparameter optimization over pruning and quantization, using an ensemble of ARS-driven policies to efficiently sample promising hyperparameter configurations within a tight evaluation budget. The approach yields a Pareto front of deployable TinyML models that surpass multiple MOOpt baselines (e.g., ParEGO, TurBO, MorBO) across CIFAR-10 and DaLiAc tasks, with robust performance under realistic memory constraints. Practically, ARS-MOBOpt enables direct deployment on common microcontrollers and is open-sourced for reproducibility and further development.

Abstract

Deploying deep neural networks (DNNs) on microcontrollers (TinyML) is a common trend to process the increasing amount of sensor data generated at the edge, but in practice, resource and latency constraints make it difficult to find optimal DNN candidates. Neural architecture search (NAS) is an excellent approach to automate this search and can easily be combined with DNN compression techniques commonly used in TinyML. However, many NAS techniques are not only computationally expensive, especially hyperparameter optimization (HPO), but also often focus on optimizing only a single objective, e.g., maximizing accuracy, without considering additional objectives such as memory requirements or computational complexity of a DNN, which are key to making deployment at the edge feasible. In this paper, we propose a novel NAS strategy for TinyML based on multi-objective Bayesian optimization (MOBOpt) and an ensemble of competing parametric policies trained using Augmented Random Search (ARS) reinforcement learning (RL) agents. Our methodology aims at efficiently finding tradeoffs between a DNN's predictive accuracy, memory requirements on a given target system, and computational complexity. Our experiments show that we consistently outperform existing MOBOpt approaches on different datasets and architectures such as ResNet-18 and MobileNetV3.
Paper Structure (16 sections, 3 equations, 8 figures, 5 tables, 1 algorithm)

This paper contains 16 sections, 3 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of our proposed RL-based multi-objective Bayesian DNN hyperparameter optimization approach for microcontroller targets.
  • Figure 2: ARS-MOBOpt compared to several baseline approaches (top row: MobileNetv3 on DaLiAc; bottom row: ResNet on CIFAR10). Left column: Our approach (ARS-MOBOpt) outperforming all others in terms of Hypervolume after 50 to 75 evaluated samples. Remaining columns: feasible Pareto sets determined by ARS-MOBOpt compared to other Bayesian (ParEGO, TurBO, MorBO) and evolutionary (NSGA-II) approaches with random sampling (Random) as a baseline.
  • Figure 3: Detailed results examining the effects of the three key ARS hyperparameters on the achieved Hypervolume, 5 seeds each. MobileNetv3, 1.6M init. params, DaLiAc dataset, window length 1024.
  • Figure 4: Topography of the optimization landscape (top row) and EI (bottom row) estimated by the Bayesian surrogates for the synthetic optimization problem shown in Fig. \ref{['fig:synth_example']} in Appendix \ref{['app:synth_example']} after 40 samples for both ParEGO and ARS (ours), given two sets of priors marked with red triangles. The global minimum is at $\theta_0 = 15$, $\theta_1 = 5$. For ARS, the rollouts of the trained policies of the 40th sample are shown as lines with their starting points marked by red crosses.
  • Figure 5: Comparison of MOBOpt with ARS and PPO for ResNet on CIFAR10 and MobileNetv3 on DaLiAc. For both cases we show random sampling as the baseline. The results for ARS and Random are the same as the results in Fig. \ref{['fig:algorithmic']}. Since the default hyperparameters of PPO did not produce results of quality better than random sampling for ResNet, we performed a hyperparameter pre-optimization over 20 trials, see (a). The Hypervolume results are shown in (b) for ARS-MOBOpt and pre-optimized PPO. (c) shows the need of problem-specific pre-optimization of hyperparameters for PPO. The green curve is obtained when applying the pre-optimized parameter settings from (a) to MobileNetv3 rather than ResNet.
  • ...and 3 more figures

Theorems & Definitions (3)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3