Luganda Speech Intent Recognition for IoT Applications

Andrew Katumba; Sudi Murindanyi; John Trevor Kasule; Elvis Mugume

Luganda Speech Intent Recognition for IoT Applications

Andrew Katumba, Sudi Murindanyi, John Trevor Kasule, Elvis Mugume

TL;DR

This work tackles Luganda, a low-resource language, in the context of IoT by developing an edge-friendly Luganda speech intent recognition system. It pairs MFCC feature extraction with Conv2D CNNs to classify Luganda voice commands, and demonstrates deployment on resource-constrained devices (Raspberry Pi and Wio Terminal) with MQTT-based IoT integration. A crowd-sourced Luganda command dataset of 20 intents is released openly to support reproducibility and further research, while the study also explores data augmentation and model quantization to enable real-time edge inference. The combination of on-device processing, open data, and IoT deployment offers a practical pathway for inclusive smart-home tech in Luganda-speaking regions, addressing localization and connectivity challenges in low-resource settings.

Abstract

The advent of Internet of Things (IoT) technology has generated massive interest in voice-controlled smart homes. While many voice-controlled smart home systems are designed to understand and support widely spoken languages like English, speakers of low-resource languages like Luganda may need more support. This research project aimed to develop a Luganda speech intent classification system for IoT applications to integrate local languages into smart home environments. The project uses hardware components such as Raspberry Pi, Wio Terminal, and ESP32 nodes as microcontrollers. The Raspberry Pi processes Luganda voice commands, the Wio Terminal is a display device, and the ESP32 nodes control the IoT devices. The ultimate objective of this work was to enable voice control using Luganda, which was accomplished through a natural language processing (NLP) model deployed on the Raspberry Pi. The NLP model utilized Mel Frequency Cepstral Coefficients (MFCCs) as acoustic features and a Convolutional Neural Network (Conv2D) architecture for speech intent classification. A dataset of Luganda voice commands was curated for this purpose and this has been made open-source. This work addresses the localization challenges and linguistic diversity in IoT applications by incorporating Luganda voice commands, enabling users to interact with smart home devices without English proficiency, especially in regions where local languages are predominant.

Luganda Speech Intent Recognition for IoT Applications

TL;DR

Abstract

Luganda Speech Intent Recognition for IoT Applications

Authors

TL;DR

Abstract

Table of Contents

Figures (3)