Table of Contents
Fetching ...

Non-Contextual BERT or FastText? A Comparative Analysis

Abhay Shanbhag, Suramya Jadhav, Amogh Thakurdesai, Ridhima Sinare, Raviraj Joshi

TL;DR

The paper tackles the challenge of NLP in low-resource Marathi by comparing non-contextual BERT embeddings against FastText across sentiment, hate speech, and news classification tasks. It employs two BERT-based models and two FastText models, and enforces a fair comparison by reducing embeddings to a common 300-dimension size via SVD, evaluated with 5-fold cross-validation. The results show contextual BERT embeddings consistently outperform FastText, while non-contextual BERT often beats FastText but suffers when compressed, highlighting a trade-off between efficiency and representational power. The findings provide practical guidance for Marathi NLP in low-resource settings and illuminate the viability of non-contextual BERT as a resource-efficient alternative when full contextual inference is impractical.

Abstract

Natural Language Processing (NLP) for low-resource languages, which lack large annotated datasets, faces significant challenges due to limited high-quality data and linguistic resources. The selection of embeddings plays a critical role in achieving strong performance in NLP tasks. While contextual BERT embeddings require a full forward pass, non-contextual BERT embeddings rely only on table lookup. Existing research has primarily focused on contextual BERT embeddings, leaving non-contextual embeddings largely unexplored. In this study, we analyze the effectiveness of non-contextual embeddings from BERT models (MuRIL and MahaBERT) and FastText models (IndicFT and MahaFT) for tasks such as news classification, sentiment analysis, and hate speech detection in one such low-resource language Marathi. We compare these embeddings with their contextual and compressed variants. Our findings indicate that non-contextual BERT embeddings extracted from the model's first embedding layer outperform FastText embeddings, presenting a promising alternative for low-resource NLP.

Non-Contextual BERT or FastText? A Comparative Analysis

TL;DR

The paper tackles the challenge of NLP in low-resource Marathi by comparing non-contextual BERT embeddings against FastText across sentiment, hate speech, and news classification tasks. It employs two BERT-based models and two FastText models, and enforces a fair comparison by reducing embeddings to a common 300-dimension size via SVD, evaluated with 5-fold cross-validation. The results show contextual BERT embeddings consistently outperform FastText, while non-contextual BERT often beats FastText but suffers when compressed, highlighting a trade-off between efficiency and representational power. The findings provide practical guidance for Marathi NLP in low-resource settings and illuminate the viability of non-contextual BERT as a resource-efficient alternative when full contextual inference is impractical.

Abstract

Natural Language Processing (NLP) for low-resource languages, which lack large annotated datasets, faces significant challenges due to limited high-quality data and linguistic resources. The selection of embeddings plays a critical role in achieving strong performance in NLP tasks. While contextual BERT embeddings require a full forward pass, non-contextual BERT embeddings rely only on table lookup. Existing research has primarily focused on contextual BERT embeddings, leaving non-contextual embeddings largely unexplored. In this study, we analyze the effectiveness of non-contextual embeddings from BERT models (MuRIL and MahaBERT) and FastText models (IndicFT and MahaFT) for tasks such as news classification, sentiment analysis, and hate speech detection in one such low-resource language Marathi. We compare these embeddings with their contextual and compressed variants. Our findings indicate that non-contextual BERT embeddings extracted from the model's first embedding layer outperform FastText embeddings, presenting a promising alternative for low-resource NLP.

Paper Structure

This paper contains 14 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Embedding extraction workflow for contextual and non-contextual representations
  • Figure 2: SVD compression of BERT embeddings
  • Figure 3: T-SNE Plot For BERT and FastText Embeddings (c stands for compressed) .
  • Figure 4: T-SNE Visualisation