Table of Contents
Fetching ...

TinyML for Speech Recognition

Andrew Barovic, Armin Moin

TL;DR

The paper tackles on-device speech recognition for ultra-constrained IoT devices by deploying a quantized 1D-CNN on an Arduino Nano 33 BLE Sense, trained via Edge Impulse to recognize 23 commands using onboard MFCC features and wake-word logic. It introduces a new open onboard-microphone dataset and demonstrates strong on-device performance with an average accuracy of 0.97 and F1 score of 0.98 across commands. The study discusses practical TinyML deployment considerations, presents a chunk-based inference framework, and validates results on real hardware, while acknowledging biases and platform-specific limitations. Future work aims to broaden data diversity, finalize command behaviors, and explore transfer/federated learning to reduce reliance on external tooling.

Abstract

We train and deploy a quantized 1D convolutional neural network model to conduct speech recognition on a highly resource-constrained IoT edge device. This can be useful in various Internet of Things (IoT) applications, such as smart homes and ambient assisted living for the elderly and people with disabilities, just to name a few examples. In this paper, we first create a new dataset with over one hour of audio data that enables our research and will be useful to future studies in this field. Second, we utilize the technologies provided by Edge Impulse to enhance our model's performance and achieve a high Accuracy of up to 97% on our dataset. For the validation, we implement our prototype using the Arduino Nano 33 BLE Sense microcontroller board. This microcontroller board is specifically designed for IoT and AI applications, making it an ideal choice for our target use case scenarios. While most existing research focuses on a limited set of keywords, our model can process 23 different keywords, enabling complex commands.

TinyML for Speech Recognition

TL;DR

The paper tackles on-device speech recognition for ultra-constrained IoT devices by deploying a quantized 1D-CNN on an Arduino Nano 33 BLE Sense, trained via Edge Impulse to recognize 23 commands using onboard MFCC features and wake-word logic. It introduces a new open onboard-microphone dataset and demonstrates strong on-device performance with an average accuracy of 0.97 and F1 score of 0.98 across commands. The study discusses practical TinyML deployment considerations, presents a chunk-based inference framework, and validates results on real hardware, while acknowledging biases and platform-specific limitations. Future work aims to broaden data diversity, finalize command behaviors, and explore transfer/federated learning to reduce reliance on external tooling.

Abstract

We train and deploy a quantized 1D convolutional neural network model to conduct speech recognition on a highly resource-constrained IoT edge device. This can be useful in various Internet of Things (IoT) applications, such as smart homes and ambient assisted living for the elderly and people with disabilities, just to name a few examples. In this paper, we first create a new dataset with over one hour of audio data that enables our research and will be useful to future studies in this field. Second, we utilize the technologies provided by Edge Impulse to enhance our model's performance and achieve a high Accuracy of up to 97% on our dataset. For the validation, we implement our prototype using the Arduino Nano 33 BLE Sense microcontroller board. This microcontroller board is specifically designed for IoT and AI applications, making it an ideal choice for our target use case scenarios. While most existing research focuses on a limited set of keywords, our model can process 23 different keywords, enabling complex commands.

Paper Structure

This paper contains 6 sections, 3 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The state diagram showing the behavior architecture of the proposed approach
  • Figure 2: A complex speech command is broken up into smaller chunks, each recognized by one of the two ML models