Table of Contents
Fetching ...

BERTer: The Efficient One

Pradyumna Saligram, Andrew Lanpouthakoun

TL;DR

The paper tackles multi-task fine-tuning of BERT for sentiment, paraphrase, and semantic similarity while prioritizing efficiency. It integrates SMART regularization, early exiting, cross-embedding Siamese concepts, and embedding-order alterations, supplemented by a novel Sequential Layer Focus and Smart Alternating strategy. Empirical results show that SLF combined with Smart Alternating and Early Exiting delivers the best balance of accuracy (SST, Para, STS) and reduced training time, though SMART's benefits are task-dependent. The work demonstrates how careful architectural and training strategy choices can yield practical, efficient multi-task NLP models with competitive performance.

Abstract

We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity. Our approach leverages SMART regularization to combat overfitting, improves hyperparameter choices, employs a cross-embedding Siamese architecture for improved sentence embeddings, and introduces innovative early exiting methods. Our fine-tuning findings currently reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures, achieving a state-of-the-art performance score of on the test set, surpassing current benchmarks and highlighting BERT's adaptability in multifaceted linguistic tasks.

BERTer: The Efficient One

TL;DR

The paper tackles multi-task fine-tuning of BERT for sentiment, paraphrase, and semantic similarity while prioritizing efficiency. It integrates SMART regularization, early exiting, cross-embedding Siamese concepts, and embedding-order alterations, supplemented by a novel Sequential Layer Focus and Smart Alternating strategy. Empirical results show that SLF combined with Smart Alternating and Early Exiting delivers the best balance of accuracy (SST, Para, STS) and reduced training time, though SMART's benefits are task-dependent. The work demonstrates how careful architectural and training strategy choices can yield practical, efficient multi-task NLP models with competitive performance.

Abstract

We explore advanced fine-tuning techniques to boost BERT's performance in sentiment analysis, paraphrase detection, and semantic textual similarity. Our approach leverages SMART regularization to combat overfitting, improves hyperparameter choices, employs a cross-embedding Siamese architecture for improved sentence embeddings, and introduces innovative early exiting methods. Our fine-tuning findings currently reveal substantial improvements in model efficiency and effectiveness when combining multiple fine-tuning architectures, achieving a state-of-the-art performance score of on the test set, surpassing current benchmarks and highlighting BERT's adaptability in multifaceted linguistic tasks.
Paper Structure (20 sections, 14 equations, 5 figures, 2 tables)

This paper contains 20 sections, 14 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: SST Accuracy over epochs
  • Figure 2: SST accuracy by Learning Rates
  • Figure 3: Dev Score by Batch Size
  • Figure 4: Dev Score by Dropout Probability
  • Figure :