Table of Contents
Fetching ...

Memory-Optimized Once-For-All Network

Maxime Girard, Victor Quétu, Samuel Tardieu, Van-Tam Nguyen, Enzo Tartaglione

TL;DR

This work addresses the memory inefficiency of the Once-For-All (OFA) supernet when deploying DNNs on memory-constrained hardware. It introduces Memory-Optimized OFA (MOOFA), a memory-balanced supernet that decomposes per-layer memory and optimizes channel sizes to maintain a memory-constant profile across stages, paired with a constrained search pipeline using a memory logger and an accuracy predictor. On ImageNet, MOOFA improves memory exploitation and can achieve higher accuracy under tight memory budgets compared to OFA and CompOFA, at the cost of higher FLOPs. The results demonstrate that deliberate memory distribution can enhance generalization and deployment versatility, with public code available for reproduction.

Abstract

Deploying Deep Neural Networks (DNNs) on different hardware platforms is challenging due to varying resource constraints. Besides handcrafted approaches aiming at making deep models hardware-friendly, Neural Architectures Search is rising as a toolbox to craft more efficient DNNs without sacrificing performance. Among these, the Once-For-All (OFA) approach offers a solution by allowing the sampling of well-performing sub-networks from a single supernet -- this leads to evident advantages in terms of computation. However, OFA does not fully utilize the potential memory capacity of the target device, focusing instead on limiting maximum memory usage per layer. This leaves room for an unexploited potential in terms of model generalizability. In this paper, we introduce a Memory-Optimized OFA (MOOFA) supernet, designed to enhance DNN deployment on resource-limited devices by maximizing memory usage (and for instance, features diversity) across different configurations. Tested on ImageNet, our MOOFA supernet demonstrates improvements in memory exploitation and model accuracy compared to the original OFA supernet. Our code is available at https://github.com/MaximeGirard/memory-optimized-once-for-all.

Memory-Optimized Once-For-All Network

TL;DR

This work addresses the memory inefficiency of the Once-For-All (OFA) supernet when deploying DNNs on memory-constrained hardware. It introduces Memory-Optimized OFA (MOOFA), a memory-balanced supernet that decomposes per-layer memory and optimizes channel sizes to maintain a memory-constant profile across stages, paired with a constrained search pipeline using a memory logger and an accuracy predictor. On ImageNet, MOOFA improves memory exploitation and can achieve higher accuracy under tight memory budgets compared to OFA and CompOFA, at the cost of higher FLOPs. The results demonstrate that deliberate memory distribution can enhance generalization and deployment versatility, with public code available for reproduction.

Abstract

Deploying Deep Neural Networks (DNNs) on different hardware platforms is challenging due to varying resource constraints. Besides handcrafted approaches aiming at making deep models hardware-friendly, Neural Architectures Search is rising as a toolbox to craft more efficient DNNs without sacrificing performance. Among these, the Once-For-All (OFA) approach offers a solution by allowing the sampling of well-performing sub-networks from a single supernet -- this leads to evident advantages in terms of computation. However, OFA does not fully utilize the potential memory capacity of the target device, focusing instead on limiting maximum memory usage per layer. This leaves room for an unexploited potential in terms of model generalizability. In this paper, we introduce a Memory-Optimized OFA (MOOFA) supernet, designed to enhance DNN deployment on resource-limited devices by maximizing memory usage (and for instance, features diversity) across different configurations. Tested on ImageNet, our MOOFA supernet demonstrates improvements in memory exploitation and model accuracy compared to the original OFA supernet. Our code is available at https://github.com/MaximeGirard/memory-optimized-once-for-all.
Paper Structure (19 sections, 9 equations, 7 figures, 2 tables)

This paper contains 19 sections, 9 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: We propose a revised architecture of the OFA supernet aiming at providing memory-optimized subnets with improved accuracy under tight memory constraints.
  • Figure 2: The Once-for-All production pipeline. A supernet is trained once, from which multiple subnets, at specific hardware constraints, can be drawn.
  • Figure 3: (a) Memory usage of the original OFA network during a forward pass. and (b) internal structure of the OFA supernet. A memory peak resulting from the initial blocks significantly restricts submodels produced by OFA. The blocks that come after these initial ones therefore use much less memory.
  • Figure 4: Top-1 accuracy under different memory constraints for three configurations trained on Imagenette. Configuration 1 uses the expansion ratio factor set from the original OFA ([3,4,5]). Configuration 2, respectively Configuration 3, use [1,1.5,2], respectively [2,3,4], as expansion ratio factor set.
  • Figure 5: Memory usage for a forward pass on OFA and MOOFA. With or without any constraint, while the original OFA architecture (a, c) presents a memory peak resulting from the initial blocks, our architecture MOOFA equalizes the memory usage across the entire network (b,d).
  • ...and 2 more figures