Table of Contents
Fetching ...

Bangla AI: A Framework for Machine Translation Utilizing Large Language Models for Ethnic Media

MD Ashraful Goni, Fahad Mostafa, Kerk F. Kee

TL;DR

This work tackles language barriers faced by Bangla-speaking diaspora communities by proposing Bangla AI, an end-to-end framework that combines LLMs and multilingual machine translation to enhance news search, classification, and translation for Bangla ethnic media in the USA. The methodology formalizes a data-driven pipeline that collects multivariate news data, represents it with $X \in \mathbb{R}^{N\times M}$ and labels $Y$, trains a classifier to produce $Y_{pred}=f(X)$, and translates content via $Bangla = T(\text{English})$. Key contributions include the mathematical representation of the data, an NLP-driven data collection pipeline from multiple sources with copyright and misinformation safeguards, and an integrated translation/search workflow designed for marginalized communities. The approach aims to empower Bangla-speaking journalists and can extend to other ethnic media contexts, with explicit attention to ethical considerations and policy needs to ensure responsible deployment.

Abstract

Ethnic media, which caters to diaspora communities in host nations, serves as a vital platform for these communities to both produce content and access information. Rather than utilizing the language of the host nation, ethnic media delivers news in the language of the immigrant community. For instance, in the USA, Bangla ethnic media presents news in Bangla rather than English. This research delves into the prospective integration of large language models (LLM) and multi-lingual machine translations (MMT) within the ethnic media industry. It centers on the transformative potential of using LLM in MMT in various facets of news translation, searching, and categorization. The paper outlines a theoretical framework elucidating the integration of LLM and MMT into the news searching and translation processes for ethnic media. Additionally, it briefly addresses the potential ethical challenges associated with the incorporation of LLM and MMT in news translation procedures.

Bangla AI: A Framework for Machine Translation Utilizing Large Language Models for Ethnic Media

TL;DR

This work tackles language barriers faced by Bangla-speaking diaspora communities by proposing Bangla AI, an end-to-end framework that combines LLMs and multilingual machine translation to enhance news search, classification, and translation for Bangla ethnic media in the USA. The methodology formalizes a data-driven pipeline that collects multivariate news data, represents it with and labels , trains a classifier to produce , and translates content via . Key contributions include the mathematical representation of the data, an NLP-driven data collection pipeline from multiple sources with copyright and misinformation safeguards, and an integrated translation/search workflow designed for marginalized communities. The approach aims to empower Bangla-speaking journalists and can extend to other ethnic media contexts, with explicit attention to ethical considerations and policy needs to ensure responsible deployment.

Abstract

Ethnic media, which caters to diaspora communities in host nations, serves as a vital platform for these communities to both produce content and access information. Rather than utilizing the language of the host nation, ethnic media delivers news in the language of the immigrant community. For instance, in the USA, Bangla ethnic media presents news in Bangla rather than English. This research delves into the prospective integration of large language models (LLM) and multi-lingual machine translations (MMT) within the ethnic media industry. It centers on the transformative potential of using LLM in MMT in various facets of news translation, searching, and categorization. The paper outlines a theoretical framework elucidating the integration of LLM and MMT into the news searching and translation processes for ethnic media. Additionally, it briefly addresses the potential ethical challenges associated with the incorporation of LLM and MMT in news translation procedures.
Paper Structure (7 sections, 4 equations, 1 figure, 1 algorithm)

This paper contains 7 sections, 4 equations, 1 figure, 1 algorithm.

Figures (1)

  • Figure 1: Flowchart of Bangla News Translation for Ethnic Media.