Table of Contents
Fetching ...

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

Hou-I Liu, Marco Galindo, Hongxia Xie, Lai-Kuan Wong, Hong-Han Shuai, Yung-Hui Li, Wen-Huang Cheng

TL;DR

This survey tackles the challenge of deploying deep learning in resource constrained environments by detailing a triad of approaches: lightweight neural network design, model compression, and hardware acceleration. It surveys a broad spectrum of lightweight CNN and transformer based architectures, analyzes pruning, quantization, KD, and NAS as compression techniques, and reviews hardware accelerators, dataflow, and software libraries for edge deployment. The authors also discuss challenges and future directions in TinyML and edge friendly lightweight LLMs and diffusion models, emphasizing co design between hardware and software. The work provides concrete guidance on selecting architectures and compression strategies for specific hardware and application contexts, bridging design choices from model level to system level and outlining practical paths toward real world edge AI deployment. The insights have practical significance for developers and researchers aiming to implement efficient DL on mobile, embedded, and IoT platforms while balancing accuracy, latency, and energy efficiency.

Abstract

Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improvements in model accuracy, deploying these models on lightweight devices, such as mobile phones and microcontrollers, is constrained by limited resources. In this survey, we provide comprehensive design guidance tailored for these devices, detailing the meticulous design of lightweight models, compression methods, and hardware acceleration strategies. The principal goal of this work is to explore methods and concepts for getting around hardware constraints without compromising the model's accuracy. Additionally, we explore two notable paths for lightweight deep learning in the future: deployment techniques for TinyML and Large Language Models. Although these paths undoubtedly have potential, they also present significant challenges, encouraging research into unexplored areas.

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

TL;DR

This survey tackles the challenge of deploying deep learning in resource constrained environments by detailing a triad of approaches: lightweight neural network design, model compression, and hardware acceleration. It surveys a broad spectrum of lightweight CNN and transformer based architectures, analyzes pruning, quantization, KD, and NAS as compression techniques, and reviews hardware accelerators, dataflow, and software libraries for edge deployment. The authors also discuss challenges and future directions in TinyML and edge friendly lightweight LLMs and diffusion models, emphasizing co design between hardware and software. The work provides concrete guidance on selecting architectures and compression strategies for specific hardware and application contexts, bridging design choices from model level to system level and outlining practical paths toward real world edge AI deployment. The insights have practical significance for developers and researchers aiming to implement efficient DL on mobile, embedded, and IoT platforms while balancing accuracy, latency, and energy efficiency.

Abstract

Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improvements in model accuracy, deploying these models on lightweight devices, such as mobile phones and microcontrollers, is constrained by limited resources. In this survey, we provide comprehensive design guidance tailored for these devices, detailing the meticulous design of lightweight models, compression methods, and hardware acceleration strategies. The principal goal of this work is to explore methods and concepts for getting around hardware constraints without compromising the model's accuracy. Additionally, we explore two notable paths for lightweight deep learning in the future: deployment techniques for TinyML and Large Language Models. Although these paths undoubtedly have potential, they also present significant challenges, encouraging research into unexplored areas.
Paper Structure (66 sections, 3 equations, 11 figures, 6 tables)

This paper contains 66 sections, 3 equations, 11 figures, 6 tables.

Figures (11)

  • Figure 1: Comparison of DenseNet, CondenseNet, and CondenseNetV2. Active weight connections are represented by solid color arrows, and pruned weight connections are represented by gray dashed arrows.
  • Figure 2: The variant of Shift-based convolution chen2019all.
  • Figure 3: Standard Vision Transformer, where $P= h \times w$, $h,w$ represents the height and the width of the images. $N$ is the number of image patches, $L$ is the number of transformer blocks, and $d$ is the dimension mehta2021mobilevit.
  • Figure 4: Illustration of pruning methods: unstructured pruning (left), and structured pruning (right). Pruned components are shown in white color. Take note of the change in the pruned component's output dimensions.
  • Figure 5: Symmetric (left) and asymmetric (right) quantization representation gholami2022survey. Note that r represents the real value, S represents the real-valued scaling factor, and Z represents the integer zero point.
  • ...and 6 more figures