Table of Contents
Fetching ...

GWPT: A Green Word-Embedding-based POS Tagger

Chengwei Wei, Runqi Pang, C. -C. Jay Kuo

TL;DR

GWPT tackles the need for fast, energy-efficient POS tagging by proposing a green learning-based tagger that avoids heavy DL architectures. It employs a three-stage cascade—representation learning from word embeddings with frequency-based dimension partitioning and adaptive N-grams, followed by discriminant feature selection via the discriminant feature test (DFT) and an XGBoost classifier—for POS prediction. A key component is frequency analysis of embedding dimensions to guide N-gram choices and dimensionality reduction, enabling compact representations without sacrificing accuracy. Experiments on PTB and UD show GWPT achieving competitive tagging accuracy with substantially fewer parameters and lower FLOPs than DL-based taggers and MultiBPEmb, making it suitable for edge devices and energy-constrained settings; future work may incorporate character embeddings or lighter classifiers for multi-class POS tagging.

Abstract

As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) feature learning, and 3) decision learning modules. The main novelty of GWPT lies in representation learning. It uses non-contextual or contextual word embeddings, partitions embedding dimension indices into low-, medium-, and high-frequency sets, and represents them with different N-grams. It is shown by experimental results that GWPT offers state-of-the-art accuracies with fewer model parameters and significantly lower computational complexity in both training and inference as compared with deep-learning-based methods.

GWPT: A Green Word-Embedding-based POS Tagger

TL;DR

GWPT tackles the need for fast, energy-efficient POS tagging by proposing a green learning-based tagger that avoids heavy DL architectures. It employs a three-stage cascade—representation learning from word embeddings with frequency-based dimension partitioning and adaptive N-grams, followed by discriminant feature selection via the discriminant feature test (DFT) and an XGBoost classifier—for POS prediction. A key component is frequency analysis of embedding dimensions to guide N-gram choices and dimensionality reduction, enabling compact representations without sacrificing accuracy. Experiments on PTB and UD show GWPT achieving competitive tagging accuracy with substantially fewer parameters and lower FLOPs than DL-based taggers and MultiBPEmb, making it suitable for edge devices and energy-constrained settings; future work may incorporate character embeddings or lighter classifiers for multi-class POS tagging.

Abstract

As a fundamental tool for natural language processing (NLP), the part-of-speech (POS) tagger assigns the POS label to each word in a sentence. A novel lightweight POS tagger based on word embeddings is proposed and named GWPT (green word-embedding-based POS tagger) in this work. Following the green learning (GL) methodology, GWPT contains three modules in cascade: 1) representation learning, 2) feature learning, and 3) decision learning modules. The main novelty of GWPT lies in representation learning. It uses non-contextual or contextual word embeddings, partitions embedding dimension indices into low-, medium-, and high-frequency sets, and represents them with different N-grams. It is shown by experimental results that GWPT offers state-of-the-art accuracies with fewer model parameters and significantly lower computational complexity in both training and inference as compared with deep-learning-based methods.
Paper Structure (14 sections, 1 equation, 5 figures, 6 tables)

This paper contains 14 sections, 1 equation, 5 figures, 6 tables.

Figures (5)

  • Figure 1: The system diagram of the GWPT method.
  • Figure 2: We plot the averaged normalized sign-change ratio (NSR) as a function of the sorted embedding dimension index from the smallest value ($l=1$) to the largest value $l=768$) against the Penn Treebank dataset using the BERT word embedding. We partition dimension indices into low-, mid-, and high-frequency sets using two elbow points with $l=50$ and $l=751$.
  • Figure 3: The validation error rate as a function of the XGBoost tree numbers for each class on the UD datasets: (top) fastText and (bottom) BERT.
  • Figure 4: Sorted discriminability for each feature dimension selected by DFT and validation and test accuracies on the UD dataset. A lower cross-entropy value indicates a more discriminant feature.
  • Figure 5: The effect of the maximum depth and the tree number in XGBoost on GWPT for the UD test set: POS tagging accuracy (top) and the model size (bottom).