Understanding the Effects of Domain Finetuning on LLMs
Eshaan Tanwar, Deepak Nathani, William Yang Wang, Tanmoy Chakraborty
TL;DR
The paper presents tuning vectors as a principled, interpretable framework to study domain-specific fine-tuning in large language models, revealing that such fine-tuning largely preserves pretrained representations while adding targeted directional shifts. The authors show that these shifts predominantly introduce new directions in MLP layers and refine attention, enabling improved instruction-following and generation in medical tasks. Cross-domain analysis indicates tuning vectors are domain-specific and not readily transferable, though combining vectors across domains can boost generalisation on some benchmarks. This work offers a general methodology for understanding and engineering domain adaptation in LLMs, with implications for modular and trustworthy model specialization.
Abstract
Large Language Models (LLMs) fine-tuned for specific domains exhibit strong performance; however, the underlying mechanisms by which this fine-tuning reshapes their parametric space are not well understood. Prior works primarily focus on auto-regressive or general-purpose instruct models, leaving domain-specialised LLMs under-explored. We present the first systematic study of domain-specific fine-tuning in large medical language models. Our analysis reveals that fine-tuning modifies only a small subset of the representational subspace, essentially preserving the pre-trained model's representation. To interpret these changes in subspaces, we propose tuning vectors, a novel framework inspired by task vectors, which explicitly capture the directional parameter shifts induced by fine-tuning. We demonstrate that these vectors are critical for enhancing both instruction-following and generation quality. Furthermore, combining tuning vectors across different domains yields improved generalisation. Upon closer inspection of directional alignment, we find these vectors primarily write new directional information into the MLP layers of the model, while amplifying existing directions in attention heads. Our findings offer new insights into LLM adaptation and provide a general, interpretable framework for analysing specialisation in large language models.
