Table of Contents
Fetching ...

MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks

Lotfi Abdelkrim Mecharbat, Alberto Marchisio, Muhammad Shafique, Mohammad M. Ghassemi, Tuka Alhanai

TL;DR

This work targets the fairness, robustness, and generalization gaps in edge DNNs by showing that state-of-the-art designs exhibit significant skin-tone disparities and sensitivity to lighting. It proposes MoENAS, a mixture-of-experts neural architecture search that replaces FFN layers with Switch FFN modules, searches the number of experts per layer via Bayesian optimization across four objectives (accuracy, fairness, robustness, generalization) under a size constraint, and prunes underused experts to maintain efficiency. Empirical results on COCO and FACET demonstrate that MoENAS improves accuracy by 4.02%, reduces skin-tone disparity from 14.09% to 5.60%, boosts robustness by 3.80%, and minimizes overfitting to 0.21%, with only a modest size increase of ~0.4M parameters, establishing a new Pareto frontier for edge DNNs. The paper also discusses ablations, limitations, and future directions, including latency and explainability, and explores a quantum-head variant as a potential future extension.

Abstract

There has been a surge in optimizing edge Deep Neural Networks (DNNs) for accuracy and efficiency using traditional optimization techniques such as pruning, and more recently, employing automatic design methodologies. However, the focus of these design techniques has often overlooked critical metrics such as fairness, robustness, and generalization. As a result, when evaluating SOTA edge DNNs' performance in image classification using the FACET dataset, we found that they exhibit significant accuracy disparities (14.09%) across 10 different skin tones, alongside issues of non-robustness and poor generalizability. In response to these observations, we introduce Mixture-of-Experts-based Neural Architecture Search (MoENAS), an automatic design technique that navigates through a space of mixture of experts to discover accurate, fair, robust, and general edge DNNs. MoENAS improves the accuracy by 4.02% compared to SOTA edge DNNs and reduces the skin tone accuracy disparities from 14.09% to 5.60%, while enhancing robustness by 3.80% and minimizing overfitting to 0.21%, all while keeping model size close to state-of-the-art models average size (+0.4M). With these improvements, MoENAS establishes a new benchmark for edge DNN design, paving the way for the development of more inclusive and robust edge DNNs.

MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks

TL;DR

This work targets the fairness, robustness, and generalization gaps in edge DNNs by showing that state-of-the-art designs exhibit significant skin-tone disparities and sensitivity to lighting. It proposes MoENAS, a mixture-of-experts neural architecture search that replaces FFN layers with Switch FFN modules, searches the number of experts per layer via Bayesian optimization across four objectives (accuracy, fairness, robustness, generalization) under a size constraint, and prunes underused experts to maintain efficiency. Empirical results on COCO and FACET demonstrate that MoENAS improves accuracy by 4.02%, reduces skin-tone disparity from 14.09% to 5.60%, boosts robustness by 3.80%, and minimizes overfitting to 0.21%, with only a modest size increase of ~0.4M parameters, establishing a new Pareto frontier for edge DNNs. The paper also discusses ablations, limitations, and future directions, including latency and explainability, and explores a quantum-head variant as a potential future extension.

Abstract

There has been a surge in optimizing edge Deep Neural Networks (DNNs) for accuracy and efficiency using traditional optimization techniques such as pruning, and more recently, employing automatic design methodologies. However, the focus of these design techniques has often overlooked critical metrics such as fairness, robustness, and generalization. As a result, when evaluating SOTA edge DNNs' performance in image classification using the FACET dataset, we found that they exhibit significant accuracy disparities (14.09%) across 10 different skin tones, alongside issues of non-robustness and poor generalizability. In response to these observations, we introduce Mixture-of-Experts-based Neural Architecture Search (MoENAS), an automatic design technique that navigates through a space of mixture of experts to discover accurate, fair, robust, and general edge DNNs. MoENAS improves the accuracy by 4.02% compared to SOTA edge DNNs and reduces the skin tone accuracy disparities from 14.09% to 5.60%, while enhancing robustness by 3.80% and minimizing overfitting to 0.21%, all while keeping model size close to state-of-the-art models average size (+0.4M). With these improvements, MoENAS establishes a new benchmark for edge DNN design, paving the way for the development of more inclusive and robust edge DNNs.

Paper Structure

This paper contains 31 sections, 11 figures, 1 table.

Figures (11)

  • Figure 1: Evaluation of Fairness, Robustness, and Generalization of SOTA edge DNNs on FACET.
  • Figure 2: Test Accuracy vs. Skin Fairness of SOTA edge DNNs: Models sharing the same architecture are connected by straight lines. The Pareto front is illustrated with a dashed line.
  • Figure 3: Summary of MoENAS contributions: (1) Replace the FFN layer with a Switch FFN layer, (2) Search within the expert mixing space for optimal architectures (according to accuracy, fairness, robustness, generalization). (3) Pruning based on expert importance for better efficiency.
  • Figure 4: Overview of MoENAS Methodology. This figure illustrates the three core steps of the MoENAS approach: (1) Replace the standard FFN layer (grey) with a Switch FFN layer (colored rectangle refers to the experts) (2) Execution of a search process within the expert mixing space to identify optimal expert combinations for accuracy, fairness, and robustness; (3) Prune the least used experts to ensure model compactness and efficiency while maintaining high performance.
  • Figure 5: Detailed view of the Attention Block with Switch FFN Layer. This layer is based on MoE, each feature vector is processed by an FFN layer (expert) among the FFN set (experts set) based on the router selection.
  • ...and 6 more figures