Table of Contents
Fetching ...

Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision

Xiangzhong Luo, Di Liu, Hao Kong, Shuo Huai, Hui Chen, Guochu Xiong, Weichen Liu

TL;DR

This survey addresses the growing gap between compute-heavy DNNs and resource-constrained embedded systems by surveying efficient DL infrastructures that span training to inference and manual to automated design. It synthesizes approaches across seven pillars—manual design, automated NAS, network compression, on-device learning, efficient LLMs, DL software/hardware frameworks, and embedded applications—and outlines future directions to achieve ubiquitous embedded intelligence. The contribution lies in collating state-of-the-art techniques for CNNs, Transformers, and LLMs at the edge, including hardware-aware NAS, pruning, quantization, distillation, on-device training paradigms, and system-level infrastructures. The findings illuminate practical pathways for deploying powerful yet efficient AI on embedded devices, with implications for mobile, automotive, and IoT ecosystems.

Abstract

Deep neural networks (DNNs) have recently achieved impressive success across a wide range of real-world vision and language processing tasks, spanning from image classification to many other downstream vision tasks, such as object detection, tracking, and segmentation. However, previous well-established DNNs, despite being able to maintain superior accuracy, have also been evolving to be deeper and wider and thus inevitably necessitate prohibitive computational resources for both training and inference. This trend further enlarges the computational gap between computation-intensive DNNs and resource-constrained embedded computing systems, making it challenging to deploy powerful DNNs upon real-world embedded computing systems towards ubiquitous embedded intelligence. To alleviate the above computational gap and enable ubiquitous embedded intelligence, we, in this survey, focus on discussing recent efficient deep learning infrastructures for embedded computing systems, spanning from training to inference, from manual to automated, from convolutional neural networks to transformers, from transformers to vision transformers, from vision models to large language models, from software to hardware, and from algorithms to applications. Specifically, we discuss recent efficient deep learning infrastructures for embedded computing systems from the lens of (1) efficient manual network design for embedded computing systems, (2) efficient automated network design for embedded computing systems, (3) efficient network compression for embedded computing systems, (4) efficient on-device learning for embedded computing systems, (5) efficient large language models for embedded computing systems, (6) efficient deep learning software and hardware for embedded computing systems, and (7) efficient intelligent applications for embedded computing systems.

Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision

TL;DR

This survey addresses the growing gap between compute-heavy DNNs and resource-constrained embedded systems by surveying efficient DL infrastructures that span training to inference and manual to automated design. It synthesizes approaches across seven pillars—manual design, automated NAS, network compression, on-device learning, efficient LLMs, DL software/hardware frameworks, and embedded applications—and outlines future directions to achieve ubiquitous embedded intelligence. The contribution lies in collating state-of-the-art techniques for CNNs, Transformers, and LLMs at the edge, including hardware-aware NAS, pruning, quantization, distillation, on-device training paradigms, and system-level infrastructures. The findings illuminate practical pathways for deploying powerful yet efficient AI on embedded devices, with implications for mobile, automotive, and IoT ecosystems.

Abstract

Deep neural networks (DNNs) have recently achieved impressive success across a wide range of real-world vision and language processing tasks, spanning from image classification to many other downstream vision tasks, such as object detection, tracking, and segmentation. However, previous well-established DNNs, despite being able to maintain superior accuracy, have also been evolving to be deeper and wider and thus inevitably necessitate prohibitive computational resources for both training and inference. This trend further enlarges the computational gap between computation-intensive DNNs and resource-constrained embedded computing systems, making it challenging to deploy powerful DNNs upon real-world embedded computing systems towards ubiquitous embedded intelligence. To alleviate the above computational gap and enable ubiquitous embedded intelligence, we, in this survey, focus on discussing recent efficient deep learning infrastructures for embedded computing systems, spanning from training to inference, from manual to automated, from convolutional neural networks to transformers, from transformers to vision transformers, from vision models to large language models, from software to hardware, and from algorithms to applications. Specifically, we discuss recent efficient deep learning infrastructures for embedded computing systems from the lens of (1) efficient manual network design for embedded computing systems, (2) efficient automated network design for embedded computing systems, (3) efficient network compression for embedded computing systems, (4) efficient on-device learning for embedded computing systems, (5) efficient large language models for embedded computing systems, (6) efficient deep learning software and hardware for embedded computing systems, and (7) efficient intelligent applications for embedded computing systems.

Paper Structure

This paper contains 48 sections, 20 equations, 30 figures, 4 tables.

Figures (30)

  • Figure 1: The organization of this paper, in which we ignore Section \ref{['sec:introduction']} and Section \ref{['sec:conclusion']} for the sake of simplicity.
  • Figure 2: Comparisons between the standard convolution (left) and the Ghost convolution (right) of GhostNets han2020ghostnettang2022ghostnetv2han2022ghostnets. In particular, compared with the standard convolutional layer, the Ghost convolutional layer can generate rich features using simple and cheaper linear operations. (figure from han2020ghostnet)
  • Figure 3: Comparisons of efficient convolutional networks that have been discussed in Section \ref{['sec:manual-convolutional-neural-networks']}, including SqueezeNet iandola2016squeezenet, MobileNets howard2017mobilenetssandler2018mobilenetv2zhou2020mobilenext, ShuffleNets zhang2018shufflenetma2018shufflenet, CondenseNets huang2018condensenetyang2021condensenet, and GhostNets han2020ghostnethan2022ghostnetstang2022ghostnetv2, in which the accuracy is evaluated on ImageNet deng2009imagenet and is taken from the respective paper. Note that the convolutional networks in this figure may be trained under different training recipes.
  • Figure 4: Illustration of the key milestones of transformer, which is originally applied to NLP tasks and has recently gained increasing popularity in the vision community. Here, we mark the vision transformers in red.
  • Figure 5:
  • ...and 25 more figures