BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Md. Nazmus Sadat Samin; Jawad Ibn Ahad; Tanjila Ahmed Medha; Fuad Rahman; Mohammad Ruhul Amin; Nabeel Mohammed; Shafin Rahman

BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

Md. Nazmus Sadat Samin, Jawad Ibn Ahad, Tanjila Ahmed Medha, Fuad Rahman, Mohammad Ruhul Amin, Nabeel Mohammed, Shafin Rahman

TL;DR

This study presents an end-to-end pipeline for converting dialectal Noakhali speech to standard Bangla speech and completed the end-to-end pipeline for dialect standardization by utilizing AlignTTS, a text-to-speech (TTS) model.

Abstract

This study focuses on recognizing Bangladeshi dialects and converting diverse Bengali accents into standardized formal Bengali speech. Dialects, often referred to as regional languages, are distinctive variations of a language spoken in a particular location and are identified by their phonetics, pronunciations, and lexicon. Subtle changes in pronunciation and intonation are also influenced by geographic location, educational attainment, and socioeconomic status. Dialect standardization is needed to ensure effective communication, educational consistency, access to technology, economic opportunities, and the preservation of linguistic resources while respecting cultural diversity. Being the fifth most spoken language with around 55 distinct dialects spoken by 160 million people, addressing Bangla dialects is crucial for developing inclusive communication tools. However, limited research exists due to a lack of comprehensive datasets and the challenges of handling diverse dialects. With the advancement in multilingual Large Language Models (mLLMs), emerging possibilities have been created to address the challenges of dialectal Automated Speech Recognition (ASR) and Machine Translation (MT). This study presents an end-to-end pipeline for converting dialectal Noakhali speech to standard Bangla speech. This investigation includes constructing a large-scale diverse dataset with dialectal speech signals that tailored the fine-tuning process in ASR and LLM for transcribing the dialect speech to dialect text and translating the dialect text to standard Bangla text. Our experiments demonstrated that fine-tuning the Whisper ASR model achieved a CER of 0.8% and WER of 1.5%, while the BanglaT5 model attained a BLEU score of 41.6% for dialect-to-standard text translation.

BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

TL;DR

Abstract

BanglaDialecto: An End-to-End AI-Powered Regional Speech Standardization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)