Table of Contents
Fetching ...

Enhancing Bangla Language Next Word Prediction and Sentence Completion through Extended RNN with Bi-LSTM Model On N-gram Language

Md Robiul Islam, Al Amin, Aniqua Nusrat Zereen

TL;DR

The paper tackles Bangla next-word prediction and sentence completion by integrating a Bi-LSTM architecture with an n-gram based preprocessing regime on a large Bangla corpus from multiple news sources. It introduces a five-model framework (one per n-gram length) with embedding, two Bi-LSTM layers, and two dense layers, achieving state-of-the-art-like accuracies on higher-order predictions (4-gram: ~99%, 5-gram: ~99.74%). The approach also demonstrates effective sentence completion by iterative word generation until a sentence boundary is reached, suggesting strong practical utility for Bangla NLP tasks. The work contributes a sizable Bangla dataset, improved prediction performance over existing methods, and a commitment to public data release to foster further research and real-world deployment.

Abstract

Texting stands out as the most prominent form of communication worldwide. Individual spend significant amount of time writing whole texts to send emails or write something on social media, which is time consuming in this modern era. Word prediction and sentence completion will be suitable and appropriate in the Bangla language to make textual information easier and more convenient. This paper expands the scope of Bangla language processing by introducing a Bi-LSTM model that effectively handles Bangla next-word prediction and Bangla sentence generation, demonstrating its versatility and potential impact. We proposed a new Bi-LSTM model to predict a following word and complete a sentence. We constructed a corpus dataset from various news portals, including bdnews24, BBC News Bangla, and Prothom Alo. The proposed approach achieved superior results in word prediction, reaching 99\% accuracy for both 4-gram and 5-gram word predictions. Moreover, it demonstrated significant improvement over existing methods, achieving 35\%, 75\%, and 95\% accuracy for uni-gram, bi-gram, and tri-gram word prediction, respectively

Enhancing Bangla Language Next Word Prediction and Sentence Completion through Extended RNN with Bi-LSTM Model On N-gram Language

TL;DR

The paper tackles Bangla next-word prediction and sentence completion by integrating a Bi-LSTM architecture with an n-gram based preprocessing regime on a large Bangla corpus from multiple news sources. It introduces a five-model framework (one per n-gram length) with embedding, two Bi-LSTM layers, and two dense layers, achieving state-of-the-art-like accuracies on higher-order predictions (4-gram: ~99%, 5-gram: ~99.74%). The approach also demonstrates effective sentence completion by iterative word generation until a sentence boundary is reached, suggesting strong practical utility for Bangla NLP tasks. The work contributes a sizable Bangla dataset, improved prediction performance over existing methods, and a commitment to public data release to foster further research and real-world deployment.

Abstract

Texting stands out as the most prominent form of communication worldwide. Individual spend significant amount of time writing whole texts to send emails or write something on social media, which is time consuming in this modern era. Word prediction and sentence completion will be suitable and appropriate in the Bangla language to make textual information easier and more convenient. This paper expands the scope of Bangla language processing by introducing a Bi-LSTM model that effectively handles Bangla next-word prediction and Bangla sentence generation, demonstrating its versatility and potential impact. We proposed a new Bi-LSTM model to predict a following word and complete a sentence. We constructed a corpus dataset from various news portals, including bdnews24, BBC News Bangla, and Prothom Alo. The proposed approach achieved superior results in word prediction, reaching 99\% accuracy for both 4-gram and 5-gram word predictions. Moreover, it demonstrated significant improvement over existing methods, achieving 35\%, 75\%, and 95\% accuracy for uni-gram, bi-gram, and tri-gram word prediction, respectively
Paper Structure (10 sections, 1 equation, 8 figures, 4 tables)

This paper contains 10 sections, 1 equation, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Proposed methodology
  • Figure 2: Data Preprocessing Workflow
  • Figure 3: Structure of Bi-LSTM recurrent neural networks
  • Figure 4: The proposed Bi-LSTM model architecture
  • Figure 5: Next word prediction from the model
  • ...and 3 more figures