Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT

Giuliano Lorenzoni; Ivens Portugal; Paulo Alencar; Donald Cowan

Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT

Giuliano Lorenzoni, Ivens Portugal, Paulo Alencar, Donald Cowan

TL;DR

The paper investigates how hyperparameter interactions shape performance when fine-tuning DistilBERT for text classification, focusing on learning rate, batch size, and epochs. It employs two polynomial regression frameworks—absolute (baseline-focused) and relative (baseline-difference)—to analyze accuracy, F1-score, and loss, using 55 DistilBERT variants from Hugging Face. Key findings show that batch size establishes a solid accuracy and F1 foundation, while learning rate drives incremental gains in relative terms, with the interaction between epochs and batch size being crucial for F1 optimization. The work advocates adaptive, metric-aware fine-tuning frameworks that account for non-linear hyperparameter effects and cross-metric trade-offs, with implications extending to NLP and CV tasks and to broader LLM tuning practices.

Abstract

This study evaluates fine-tuning strategies for text classification using the DistilBERT model, specifically the distilbert-base-uncased-finetuned-sst-2-english variant. Through structured experiments, we examine the influence of hyperparameters such as learning rate, batch size, and epochs on accuracy, F1-score, and loss. Polynomial regression analyses capture foundational and incremental impacts of these hyperparameters, focusing on fine-tuning adjustments relative to a baseline model. Results reveal variability in metrics due to hyperparameter configurations, showing trade-offs among performance metrics. For example, a higher learning rate reduces loss in relative analysis (p=0.027) but challenges accuracy improvements. Meanwhile, batch size significantly impacts accuracy and F1-score in absolute regression (p=0.028 and p=0.005) but has limited influence on loss optimization (p=0.170). The interaction between epochs and batch size maximizes F1-score (p=0.001), underscoring the importance of hyperparameter interplay. These findings highlight the need for fine-tuning strategies addressing non-linear hyperparameter interactions to balance performance across metrics. Such variability and metric trade-offs are relevant for tasks beyond text classification, including NLP and computer vision. This analysis informs fine-tuning strategies for large language models and promotes adaptive designs for broader model applicability.

Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT

TL;DR

Abstract

Paper Structure (49 sections, 12 figures)

This paper contains 49 sections, 12 figures.

Introduction
Related Work
Learning Rate Tuning in LLMs
Fine-Tuning Strategies for Domain-Specific Tasks
Parameter-Efficient Techniques for Domain Optimization
Supporting Fine-Tuning Through Automated Systems
Fine-Tuning for Low-Resource Languages
Optimizing LLM Performance at Scale
Detecting Bots with Transformer-Based Models
LLM Techniques for Document Understanding
Fine-Tuning DistilBERT in Classification Tasks
Experiment Design
Data Collection
Model Selection and Task Design
Hyperparameters and Evaluation Metrics
...and 34 more sections

Figures (12)

Figure 1: Heatmap of Parameter Correlations (Absolute Polynomial Regression)
Figure 2: Polynomial Regression Plot: Accuracy (Absolute)
Figure 3: Heatmap of Parameter Correlations (Relative Polynomial Regression)
Figure 4: Polynomial Regression Plot: Accuracy Difference (Relative)
Figure 5: Heatmap of Parameter Correlations (Absolute Polynomial Regression - F1)
...and 7 more figures

Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT

TL;DR

Abstract

Exploring Variability in Fine-Tuned Models for Text Classification with DistilBERT

Authors

TL;DR

Abstract

Table of Contents

Figures (12)