BERTer: The Efficient One

Pradyumna Saligram; Andrew Lanpouthakoun

BERTer: The Efficient One

Pradyumna Saligram, Andrew Lanpouthakoun

TL;DR

The paper tackles multi-task fine-tuning of BERT for sentiment, paraphrase, and semantic similarity while prioritizing efficiency. It integrates SMART regularization, early exiting, cross-embedding Siamese concepts, and embedding-order alterations, supplemented by a novel Sequential Layer Focus and Smart Alternating strategy. Empirical results show that SLF combined with Smart Alternating and Early Exiting delivers the best balance of accuracy (SST, Para, STS) and reduced training time, though SMART's benefits are task-dependent. The work demonstrates how careful architectural and training strategy choices can yield practical, efficient multi-task NLP models with competitive performance.

Abstract

We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity. Our approach leverages SMART regularization to combat overfitting, improves hyperparameter choices, employs a cross-embedding Siamese architecture for improved sentence embeddings, and introduces innovative early exiting methods. Our fine-tuning findings currently reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures, achieving a state-of-the-art performance score of on the test set, surpassing current benchmarks and highlighting BERT's adaptability in multifaceted linguistic tasks.

BERTer: The Efficient One

TL;DR

Abstract

Paper Structure (20 sections, 14 equations, 5 figures, 2 tables)

This paper contains 20 sections, 14 equations, 5 figures, 2 tables.

Introduction
Related Work
Approach
Baseline Multitask Classifier
SMART Regularization and Alternative Optimizers
Early Exiting Mechanisms and Novel Fine-Tuning Strategy
Order Alteration of Word Embeddings and Concatenation
Experiments
Data
Sentiment Analysis
Paraphrase Detection
Sentence Textual Similarity Analysis
Evaluation Method
Experimental Details
Results
...and 5 more sections

Figures (5)

Figure 1: SST Accuracy over epochs
Figure 2: SST accuracy by Learning Rates
Figure 3: Dev Score by Batch Size
Figure 4: Dev Score by Dropout Probability
Figure :

BERTer: The Efficient One

TL;DR

Abstract

BERTer: The Efficient One

Authors

TL;DR

Abstract

Table of Contents

Figures (5)