Table of Contents
Fetching ...

Accelerating Bangla NLP Tasks with Automatic Mixed Precision: Resource-Efficient Training Preserving Model Efficacy

Md Mehrab Hossain Opi, Sumaiya Khan, Moshammad Farzana Rahman

TL;DR

The paper tackles the resource-intensive training of Bangla NLP models in hardware-constrained settings. It investigates Automatic Mixed Precision (AMP) training to reduce memory and training time while preserving task performance. The authors evaluate AMP across sentiment analysis, NER, error classification, and question answering using BanglaBERT, BanglishBERT, mBERT, and XLM-R. Results show substantial efficiency gains (memory and speed) with negligible or no loss in performance, suggesting AMP as a practical tool to democratize Bangla NLP research.

Abstract

Training models for Natural Language Processing (NLP) requires substantial computational resources and time, posing significant challenges, especially for NLP development in Bangla, where access to high-end hardware is often limited. In this work, we explore automatic mixed precision (AMP) training as a means to improve computational efficiency without sacrificing model performance. By leveraging a dynamic mix of 16-bit and 32-bit floating-point computations, AMP lowers GPU memory requirements and speeds up training without degrading model performance. We evaluate AMP across four standard Bangla NLP tasks, namely sentiment analysis, named entity recognition, error classification, and question answering, using four transformer-based models: BanglaBERT, BanglishBERT, XLM-R, and mBERT. Our results demonstrate that AMP accelerates training by 44.5% and reduces memory consumption by 17.6%, while maintaining F-1 score within 99.7% of the full-precision baselines. This empirical study highlights AMP's potential to democratize access to state-of-the-art NLP capabilities in hardware-constrained settings by lowering computational barriers.

Accelerating Bangla NLP Tasks with Automatic Mixed Precision: Resource-Efficient Training Preserving Model Efficacy

TL;DR

The paper tackles the resource-intensive training of Bangla NLP models in hardware-constrained settings. It investigates Automatic Mixed Precision (AMP) training to reduce memory and training time while preserving task performance. The authors evaluate AMP across sentiment analysis, NER, error classification, and question answering using BanglaBERT, BanglishBERT, mBERT, and XLM-R. Results show substantial efficiency gains (memory and speed) with negligible or no loss in performance, suggesting AMP as a practical tool to democratize Bangla NLP research.

Abstract

Training models for Natural Language Processing (NLP) requires substantial computational resources and time, posing significant challenges, especially for NLP development in Bangla, where access to high-end hardware is often limited. In this work, we explore automatic mixed precision (AMP) training as a means to improve computational efficiency without sacrificing model performance. By leveraging a dynamic mix of 16-bit and 32-bit floating-point computations, AMP lowers GPU memory requirements and speeds up training without degrading model performance. We evaluate AMP across four standard Bangla NLP tasks, namely sentiment analysis, named entity recognition, error classification, and question answering, using four transformer-based models: BanglaBERT, BanglishBERT, XLM-R, and mBERT. Our results demonstrate that AMP accelerates training by 44.5% and reduces memory consumption by 17.6%, while maintaining F-1 score within 99.7% of the full-precision baselines. This empirical study highlights AMP's potential to democratize access to state-of-the-art NLP capabilities in hardware-constrained settings by lowering computational barriers.

Paper Structure

This paper contains 13 sections, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of the proposed AMP training methodology for Bangla NLP tasks.
  • Figure 2: Comparison of FP32 and AMP training efficiency across four Bangla NLP tasks.
  • Figure 3: Comparison of AMP and full-precision training across batch sizes. (a) Throughput, (b) Training time, (c) GPU memory usage. AMP substantially improves throughput and reduces training time without significant memory overhead.