Table of Contents
Fetching ...

Understanding the Effects of Domain Finetuning on LLMs

Eshaan Tanwar, Deepak Nathani, William Yang Wang, Tanmoy Chakraborty

TL;DR

The paper presents tuning vectors as a principled, interpretable framework to study domain-specific fine-tuning in large language models, revealing that such fine-tuning largely preserves pretrained representations while adding targeted directional shifts. The authors show that these shifts predominantly introduce new directions in MLP layers and refine attention, enabling improved instruction-following and generation in medical tasks. Cross-domain analysis indicates tuning vectors are domain-specific and not readily transferable, though combining vectors across domains can boost generalisation on some benchmarks. This work offers a general methodology for understanding and engineering domain adaptation in LLMs, with implications for modular and trustworthy model specialization.

Abstract

Large Language Models (LLMs) fine-tuned for specific domains exhibit strong performance; however, the underlying mechanisms by which this fine-tuning reshapes their parametric space are not well understood. Prior works primarily focus on auto-regressive or general-purpose instruct models, leaving domain-specialised LLMs under-explored. We present the first systematic study of domain-specific fine-tuning in large medical language models. Our analysis reveals that fine-tuning modifies only a small subset of the representational subspace, essentially preserving the pre-trained model's representation. To interpret these changes in subspaces, we propose tuning vectors, a novel framework inspired by task vectors, which explicitly capture the directional parameter shifts induced by fine-tuning. We demonstrate that these vectors are critical for enhancing both instruction-following and generation quality. Furthermore, combining tuning vectors across different domains yields improved generalisation. Upon closer inspection of directional alignment, we find these vectors primarily write new directional information into the MLP layers of the model, while amplifying existing directions in attention heads. Our findings offer new insights into LLM adaptation and provide a general, interpretable framework for analysing specialisation in large language models.

Understanding the Effects of Domain Finetuning on LLMs

TL;DR

The paper presents tuning vectors as a principled, interpretable framework to study domain-specific fine-tuning in large language models, revealing that such fine-tuning largely preserves pretrained representations while adding targeted directional shifts. The authors show that these shifts predominantly introduce new directions in MLP layers and refine attention, enabling improved instruction-following and generation in medical tasks. Cross-domain analysis indicates tuning vectors are domain-specific and not readily transferable, though combining vectors across domains can boost generalisation on some benchmarks. This work offers a general methodology for understanding and engineering domain adaptation in LLMs, with implications for modular and trustworthy model specialization.

Abstract

Large Language Models (LLMs) fine-tuned for specific domains exhibit strong performance; however, the underlying mechanisms by which this fine-tuning reshapes their parametric space are not well understood. Prior works primarily focus on auto-regressive or general-purpose instruct models, leaving domain-specialised LLMs under-explored. We present the first systematic study of domain-specific fine-tuning in large medical language models. Our analysis reveals that fine-tuning modifies only a small subset of the representational subspace, essentially preserving the pre-trained model's representation. To interpret these changes in subspaces, we propose tuning vectors, a novel framework inspired by task vectors, which explicitly capture the directional parameter shifts induced by fine-tuning. We demonstrate that these vectors are critical for enhancing both instruction-following and generation quality. Furthermore, combining tuning vectors across different domains yields improved generalisation. Upon closer inspection of directional alignment, we find these vectors primarily write new directional information into the MLP layers of the model, while amplifying existing directions in attention heads. Our findings offer new insights into LLM adaptation and provide a general, interpretable framework for analysing specialisation in large language models.

Paper Structure

This paper contains 18 sections, 8 equations, 7 figures, 15 tables.

Figures (7)

  • Figure 1: Activated neurons across layers. Percentage of activated neurons across layers for the three model families: Meta-Llama 3, Qwen2.5, and Phi-3.5. In all families, the number of activated neurons tends to increase with layer depth. Within each model family, the proportion of activated neurons in a model remains relatively constant across layers.
  • Figure 2: Normalised edit distance between pretrained and fine-tuned models' neural activation patterns. The figure shows the layer-wise normalised edit distance between the pretrained and fine-tuned models for the three model families: Meta-Llama 3, Qwen2.5, and Phi-3.5. We note that the edit distance remains small across layers for all models.
  • Figure 3: Cosign similarity b/w cross domain tuning vector. We generally find that vectors are orthogonal to each other. Abbreviations: Aloe: Qwen2.5-Aloe-Beta-7B, Code: Qwen2.5-Coder-7B-Instruct, Inst: Qwen2.5-7B-Instruct, Maths: Qwen2.5-Math-7B-Instruct
  • Figure 4: Normalised accuracy of the model created by adding tasks vectors from Qwen2.5-Math-7B-Instruct and Qwen2.5-Aloe-beta-7B. Blue points represent the performance of medical benchmarks, while orange points represent the performance on math benchmarks. (ref. Figure \ref{['fig:adding_task_vector_med_math']} for more details.)
  • Figure 5: Heat map showing the percentage drop in performance when neurons are ablated. (A) Only the top 1% of neurons are ablated; (B) only the top 5% of active neurons are ablated; (C) all active neurons are ablated.
  • ...and 2 more figures