Multitask Fine-Tuning and Generative Adversarial Learning for Improved Auxiliary Classification

Christopher Sun; Abishek Satish

Multitask Fine-Tuning and Generative Adversarial Learning for Improved Auxiliary Classification

Christopher Sun, Abishek Satish

TL;DR

This work introduces Multitask BERT, a multitask-fine-tuned BERT model across sentiment analysis, paraphrase detection, and semantic textual similarity, enhanced by gradient surgery (PCGrad) and data augmentation. It also extends GAN-BERT with AC-GAN-BERT, a conditional generator framework to exploit unlabeled data for auxiliary classification, examining both quantitative and qualitative effects. The Multitask BERT model achieves a final test performance of $0.516$ (SST), $0.886$ (paraphrase), and $0.864$ (STS), with an overall score of $0.778$, while AC-GAN-BERT yields class-conditioned embeddings and avoids mode collapse, though it does not consistently improve accuracy. The findings support the viability of generator-assisted representations for knowledge distillation and highlight avenues for future improvements in loss design and regularization to maximize multitask and semi-supervised gains.

Abstract

In this study, we implement a novel BERT architecture for multitask fine-tuning on three downstream tasks: sentiment classification, paraphrase detection, and semantic textual similarity prediction. Our model, Multitask BERT, incorporates layer sharing and a triplet architecture, custom sentence pair tokenization, loss pairing, and gradient surgery. Such optimizations yield a 0.516 sentiment classification accuracy, 0.886 paraphase detection accuracy, and 0.864 semantic textual similarity correlation on test data. We also apply generative adversarial learning to BERT, constructing a conditional generator model that maps from latent space to create fake embeddings in $\mathbb{R}^{768}$. These fake embeddings are concatenated with real BERT embeddings and passed into a discriminator model for auxiliary classification. Using this framework, which we refer to as AC-GAN-BERT, we conduct semi-supervised sensitivity analyses to investigate the effect of increasing amounts of unlabeled training data on AC-GAN-BERT's test accuracy. Overall, aside from implementing a high-performing multitask classification system, our novelty lies in the application of adversarial learning to construct a generator that mimics BERT. We find that the conditional generator successfully produces rich embeddings with clear spatial correlation with class labels, demonstrating avoidance of mode collapse. Our findings validate the GAN-BERT approach and point to future directions of generator-aided knowledge distillation.

Multitask Fine-Tuning and Generative Adversarial Learning for Improved Auxiliary Classification

TL;DR

(SST),

(paraphrase), and

(STS), with an overall score of

, while AC-GAN-BERT yields class-conditioned embeddings and avoids mode collapse, though it does not consistently improve accuracy. The findings support the viability of generator-assisted representations for knowledge distillation and highlight avenues for future improvements in loss design and regularization to maximize multitask and semi-supervised gains.

Abstract

. These fake embeddings are concatenated with real BERT embeddings and passed into a discriminator model for auxiliary classification. Using this framework, which we refer to as AC-GAN-BERT, we conduct semi-supervised sensitivity analyses to investigate the effect of increasing amounts of unlabeled training data on AC-GAN-BERT's test accuracy. Overall, aside from implementing a high-performing multitask classification system, our novelty lies in the application of adversarial learning to construct a generator that mimics BERT. We find that the conditional generator successfully produces rich embeddings with clear spatial correlation with class labels, demonstrating avoidance of mode collapse. Our findings validate the GAN-BERT approach and point to future directions of generator-aided knowledge distillation.

Paper Structure (27 sections, 3 equations, 6 figures, 4 tables)

This paper contains 27 sections, 3 equations, 6 figures, 4 tables.

Introduction and Related Work
Approach
Multitask BERT
Sentiment Analysis
Paraphrase Detection
Semantic Textual Similarity
Multitask Fine-tuning Optimizations
Summary of Semi-Supervised GAN Training Framework
Discriminator
Conditional Generator
Experiments
Data
Sentiment Analysis
Paraphrase Detection
Semantic Textual Similarity
...and 12 more sections

Figures (6)

Figure 1: Multitask BERT Architecture
Figure 2: Auxiliary Classifier GAN-BERT Architecture (number of hidden layers variable)
Figure 3: Unlabeling-Induced Accuracy Decrease
Figure 4: Unconditional GAN-BERT - 40% Unlabeled
Figure 5: AC-GAN-BERT - 40% Unlabeled
...and 1 more figures

Multitask Fine-Tuning and Generative Adversarial Learning for Improved Auxiliary Classification

TL;DR

Abstract

Multitask Fine-Tuning and Generative Adversarial Learning for Improved Auxiliary Classification

Authors

TL;DR

Abstract

Table of Contents

Figures (6)