Table of Contents
Fetching ...

Can LLMs Revolutionize the Design of Explainable and Efficient TinyML Models?

Christophe El Zeinaty, Wassim Hamidouche, Glenn Herrou, Daniel Menard, Merouane Debbah

TL;DR

The paper presents a novel framework for TinyML model design that leverages an LLM-guided neural architecture search with a Pareto-optimized feedback loop, a ViT-based knowledge distillation stage, and an explanatory module for XAI. It targets CIFAR-100 classification on memory-constrained STM32 MCUs, achieving SOTA-like accuracy (up to around 74%) with sub-100M MACs and SRAM well below 320 KB. The approach demonstrates significant reductions in search time and memory usage compared to prior NAS methods, while improving generalization via KD from a ViT teacher. The combination of LLM-driven search, multi-objective optimization, KD, and XAI offers a scalable path to efficient, accurate, and interpretable TinyML architectures.

Abstract

This paper introduces a novel framework for designing efficient neural network architectures specifically tailored to tiny machine learning (TinyML) platforms. By leveraging large language models (LLMs) for neural architecture search (NAS), a vision transformer (ViT)-based knowledge distillation (KD) strategy, and an explainability module, the approach strikes an optimal balance between accuracy, computational efficiency, and memory usage. The LLM-guided search explores a hierarchical search space, refining candidate architectures through Pareto optimization based on accuracy, multiply-accumulate operations (MACs), and memory metrics. The best-performing architectures are further fine-tuned using logits-based KD with a pre-trained ViT-B/16 model, which enhances generalization without increasing model size. Evaluated on the CIFAR-100 dataset and deployed on an STM32H7 microcontroller (MCU), the three proposed models, LMaNet-Elite, LMaNet-Core, and QwNet-Core, achieve accuracy scores of 74.50%, 74.20% and 73.00%, respectively. All three models surpass current state-of-the-art (SOTA) models, such as MCUNet-in3/in4 (69.62% / 72.86%) and XiNet (72.27%), while maintaining a low computational cost of less than 100 million MACs and adhering to the stringent 320 KB static random-access memory (SRAM) constraint. These results demonstrate the efficiency and performance of the proposed framework for TinyML platforms, underscoring the potential of combining LLM-driven search, Pareto optimization, KD, and explainability to develop accurate, efficient, and interpretable models. This approach opens new possibilities in NAS, enabling the design of efficient architectures specifically suited for TinyML.

Can LLMs Revolutionize the Design of Explainable and Efficient TinyML Models?

TL;DR

The paper presents a novel framework for TinyML model design that leverages an LLM-guided neural architecture search with a Pareto-optimized feedback loop, a ViT-based knowledge distillation stage, and an explanatory module for XAI. It targets CIFAR-100 classification on memory-constrained STM32 MCUs, achieving SOTA-like accuracy (up to around 74%) with sub-100M MACs and SRAM well below 320 KB. The approach demonstrates significant reductions in search time and memory usage compared to prior NAS methods, while improving generalization via KD from a ViT teacher. The combination of LLM-driven search, multi-objective optimization, KD, and XAI offers a scalable path to efficient, accurate, and interpretable TinyML architectures.

Abstract

This paper introduces a novel framework for designing efficient neural network architectures specifically tailored to tiny machine learning (TinyML) platforms. By leveraging large language models (LLMs) for neural architecture search (NAS), a vision transformer (ViT)-based knowledge distillation (KD) strategy, and an explainability module, the approach strikes an optimal balance between accuracy, computational efficiency, and memory usage. The LLM-guided search explores a hierarchical search space, refining candidate architectures through Pareto optimization based on accuracy, multiply-accumulate operations (MACs), and memory metrics. The best-performing architectures are further fine-tuned using logits-based KD with a pre-trained ViT-B/16 model, which enhances generalization without increasing model size. Evaluated on the CIFAR-100 dataset and deployed on an STM32H7 microcontroller (MCU), the three proposed models, LMaNet-Elite, LMaNet-Core, and QwNet-Core, achieve accuracy scores of 74.50%, 74.20% and 73.00%, respectively. All three models surpass current state-of-the-art (SOTA) models, such as MCUNet-in3/in4 (69.62% / 72.86%) and XiNet (72.27%), while maintaining a low computational cost of less than 100 million MACs and adhering to the stringent 320 KB static random-access memory (SRAM) constraint. These results demonstrate the efficiency and performance of the proposed framework for TinyML platforms, underscoring the potential of combining LLM-driven search, Pareto optimization, KD, and explainability to develop accurate, efficient, and interpretable models. This approach opens new possibilities in NAS, enabling the design of efficient architectures specifically suited for TinyML.

Paper Structure

This paper contains 25 sections, 12 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Our proposed architectures balance accuracy and efficiency, reducing model size and computational cost compared to baselines. Marker size and color indicate model size, XiNet-Class is shown as a triangle due to missing size information.
  • Figure 2: Overview of the LLM-guided architecture search process with integrated explainability. The LLM generates candidate architectures, which are validated against MAC and SRAM constraints. Valid candidates undergo lightweight training, followed by Pareto optimization to evaluate trade-offs between accuracy, computational cost, and memory usage. During the process, the LLM provides explanations for its design choices, enhancing interpretability and guiding further iterations. The best candidates are fully trained and refined using ViT-based knowledge distillation to produce the final deployable model.
  • Figure 3: The llm defines the configuration parameters for each stage, including layers $L_i$, kernel size $K$, stride $S$, activation $A$, convolution type $Conv\ Block$, expansion $E$, and output channels $C_{\text{out}}$, along with the use of building blocks like dwsepconv and mbconv, and controls the inclusion of the se block and skip connections.
  • Figure 4: Example of initial prompt used to guide LLM architecture generation.
  • Figure 5: SRAM usage comparison of different models for deployment on STM32H7. The purple dashed line indicates the 320 KB SRAM constraint. Each bar shows the SRAM usage (in KB) of the models, with comparative factors relative to LMaNet-Elite, which achieves the lowest peak memory consumption.
  • ...and 1 more figures