What Happens To BERT Embeddings During Fine-tuning?
Amil Merchant, Elahe Rahimtoroghi, Ellie Pavlick, Ian Tenney
TL;DR
The paper probes how BERT representations evolve during fine-tuning for dependency parsing, MNLI, and SQuAD using edge and structural probing, Representational Similarity Analysis (RSA), and layer ablations. It finds no catastrophic forgetting of linguistic features from pre-training, with changes concentrated in the top layers and varying depth depending on the task (deep reconfiguration for parsing, shallower changes for MNLI and SQuAD). RSA and ablations reveal that in-domain inputs drive most changes, while out-of-domain sentences remain close to the pre-trained representations, suggesting limited generalization shifts. Overall, fine-tuning is a conservative adaptation that preserves linguistic information while selectively reconfiguring the encoder, highlighting both the strength and potential limits of standard transfer procedures.
Abstract
While there has been much recent work studying how linguistic information is encoded in pre-trained sentence representations, comparatively little is understood about how these models change when adapted to solve downstream tasks. Using a suite of analysis techniques (probing classifiers, Representational Similarity Analysis, and model ablations), we investigate how fine-tuning affects the representations of the BERT model. We find that while fine-tuning necessarily makes significant changes, it does not lead to catastrophic forgetting of linguistic phenomena. We instead find that fine-tuning primarily affects the top layers of BERT, but with noteworthy variation across tasks. In particular, dependency parsing reconfigures most of the model, whereas SQuAD and MNLI appear to involve much shallower processing. Finally, we also find that fine-tuning has a weaker effect on representations of out-of-domain sentences, suggesting room for improvement in model generalization.
