Text Classification: Neural Networks VS Machine Learning Models VS Pre-trained Models

Christos Petridis

Text Classification: Neural Networks VS Machine Learning Models VS Pre-trained Models

Christos Petridis

TL;DR

This study benchmarks text classification across three paradigms: pre-trained transformer models, standard neural networks, and traditional machine learning algorithms, using TF-IDF and GloVe embeddings. It demonstrates that pre-trained transformers (e.g., BERT, RoBERTa, XLM-RoBERTa) consistently outperform other approaches, especially on the level-1 task, while level-2 remains harder due to more classes. Embedding choice is crucial for non-transformer models, with GloVe providing clear gains over TF-IDF; nonetheless, traditional methods lag behind fine-tuned transformers. The work also highlights the practicality of transfer learning and notes some anomalies (e.g., ALBERT on level-2) and the trade-offs between model size, speed, and accuracy for deployment decisions.

Abstract

Text classification is a very common task nowadays and there are many efficient methods and algorithms that we can employ to accomplish it. Transformers have revolutionized the field of deep learning, particularly in Natural Language Processing (NLP) and have rapidly expanded to other domains such as computer vision, time-series analysis and more. The transformer model was firstly introduced in the context of machine translation and its architecture relies on self-attention mechanisms to capture complex relationships within data sequences. It is able to handle long-range dependencies more effectively than traditional neural networks (such as Recurrent Neural Networks and Multilayer Perceptrons). In this work, we present a comparison between different techniques to perform text classification. We take into consideration seven pre-trained models, three standard neural networks and three machine learning models. For standard neural networks and machine learning models we also compare two embedding techniques: TF-IDF and GloVe, with the latter consistently outperforming the former. Finally, we demonstrate the results from our experiments where pre-trained models such as BERT and DistilBERT always perform better than standard models/algorithms.

Text Classification: Neural Networks VS Machine Learning Models VS Pre-trained Models

TL;DR

Abstract

Paper Structure (25 sections, 3 equations, 8 figures, 10 tables)

This paper contains 25 sections, 3 equations, 8 figures, 10 tables.

Introduction
Related Work
Embeddings
Dataset
Distribution of classes
Data pre-processing
Merge the Features
Data Cleaning and Tokenization
Lemmatization and Stopwords Removal
Neural Networks
Training Phase
Neural Networks Results
Machine Learning Models
Hyperparameter tuning
K-Fold cross validation
...and 10 more sections

Figures (8)

Figure 1: Distribution of the classes in our dataset. It is also evident how many level-2 categories we have under each level-1 category.
Figure 2: Training Loss (on the left) and Test Accuracy (on the right) employing GloVe for level-1 category.
Figure 3: Training Loss (on the left) and Test Accuracy (on the right) employing TF-IDF for level-1 category.
Figure 4: Training Loss (on the left) and Test Accuracy (on the right) employing GloVe for level-2 category.
Figure 5: Training Loss (on the left) and Test Accuracy (on the right) employing TF-IDF for level-2 category.
...and 3 more figures

Text Classification: Neural Networks VS Machine Learning Models VS Pre-trained Models

TL;DR

Abstract

Text Classification: Neural Networks VS Machine Learning Models VS Pre-trained Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)