Table of Contents
Fetching ...

Knowledge is a Region in Weight Space for Fine-tuned Language Models

Almog Gueta, Elad Venezian, Colin Raffel, Noam Slonim, Yoav Katz, Leshem Choshen

TL;DR

This work reveals that finetuned language-model weights occupy dataset- and task-specific regions in weight space, forming a bounded, convex loss basin. By analyzing weight-space geometry through clustering, interpolation, and convex hull sampling, the authors show that points within these regions retain high performance and can even outperform the original finetuned endpoints. They introduce a practical approach: averaging models within a region (centroid) to initialize efficient finetuning (BitFit), achieving measurable gains across multiple datasets and especially in few-shot settings. The findings deepen our understanding of the loss landscape, suggesting new strategies for model fusion, initialization, and rapid adaptation across tasks and datasets.

Abstract

Research on neural networks has focused on understanding a single model trained on a single dataset. However, relatively little is known about the relationships between different models, particularly those trained or tested on different datasets. We address this by studying how the weight space and the underlying loss landscape of different models are interconnected. Specifically, we demonstrate that finetuned models that were optimized for high performance, reside in well-defined regions in weight space, and vice versa -- that any model that resides anywhere in those regions also exhibits high performance. Notably, we show that language models that have been finetuned on the same dataset form a tight cluster in the weight space, while models finetuned on different datasets from the same underlying task form a looser cluster. Moreover, traversing around the region between the models leads to new models that perform comparably or even better than models obtained via finetuning, even on tasks that the original models were not finetuned on. Our findings provide insight into the relationships between models, demonstrating that a model positioned between two similar models can acquire the knowledge of both. We leverage this and design a method for selecting a better model for efficient finetuning. Specifically, we show that starting from the center of the region is as effective, if not more, than using the pretrained model in 11 out of 12 datasets, resulting in an average accuracy improvement of 3.06.

Knowledge is a Region in Weight Space for Fine-tuned Language Models

TL;DR

This work reveals that finetuned language-model weights occupy dataset- and task-specific regions in weight space, forming a bounded, convex loss basin. By analyzing weight-space geometry through clustering, interpolation, and convex hull sampling, the authors show that points within these regions retain high performance and can even outperform the original finetuned endpoints. They introduce a practical approach: averaging models within a region (centroid) to initialize efficient finetuning (BitFit), achieving measurable gains across multiple datasets and especially in few-shot settings. The findings deepen our understanding of the loss landscape, suggesting new strategies for model fusion, initialization, and rapid adaptation across tasks and datasets.

Abstract

Research on neural networks has focused on understanding a single model trained on a single dataset. However, relatively little is known about the relationships between different models, particularly those trained or tested on different datasets. We address this by studying how the weight space and the underlying loss landscape of different models are interconnected. Specifically, we demonstrate that finetuned models that were optimized for high performance, reside in well-defined regions in weight space, and vice versa -- that any model that resides anywhere in those regions also exhibits high performance. Notably, we show that language models that have been finetuned on the same dataset form a tight cluster in the weight space, while models finetuned on different datasets from the same underlying task form a looser cluster. Moreover, traversing around the region between the models leads to new models that perform comparably or even better than models obtained via finetuning, even on tasks that the original models were not finetuned on. Our findings provide insight into the relationships between models, demonstrating that a model positioned between two similar models can acquire the knowledge of both. We leverage this and design a method for selecting a better model for efficient finetuning. Specifically, we show that starting from the center of the region is as effective, if not more, than using the pretrained model in 11 out of 12 datasets, resulting in an average accuracy improvement of 3.06.
Paper Structure (42 sections, 1 equation, 14 figures, 3 tables)

This paper contains 42 sections, 1 equation, 14 figures, 3 tables.

Figures (14)

  • Figure 1: A schematic view of the weight space. Finetuning ends up in a region determined by the dataset (deep blue) which resides in the task (light blue) and language tasks regions (outer blue). Any combination of finetuned weights is found within the region. Each region is characterized by a low loss on the corresponding: dataset, task datasets, or diverse linguistic datasets. Generally, loss is lower inside the region than outside or in its boundaries.
  • Figure 2: Clusters of finetuned models on different datasets or tasks, projected by t-SNE. We find that both datasets and dataset families correspond to regions in space. In each figure, each model is represented as a dot, where the inner color is the color of the dataset/task the model was finetuned with and the outer color is the color of the most common dataset/task in the cluster (representing the cluster label). Datasets/tasks names are shown in legends.
  • Figure 3: Losses of linearly interpolated models created between pairs of similar models. The best loss often lies between models. In each figure, the solid line is the losses' average during interpolations for different $\alpha$ values, the edges of the lines represent the average loss pure finetuned models we interpolate, the Y axis is the average loss value, and the X axis is the position determined by $\alpha$. The shade is the standard deviation of the losses' average.
  • Figure 4: Loss distributions of 3 groups: In (similarly finetuned models), In' (models between models in In), and Ex (baseline models). Fig. \ref{["fig:metric_mnli_g'"]} shows 5 models from MNLI region tested on the MNLI loss. Fig. \ref{["fig:metric_nli_g'"]} shows models from NLI region tested on NLI losses. Fig. \ref{["fig:metric_general_g'"]} shows models from the General region tested on the General losses.
  • Figure 5: Losses of linearly extrapolated models created from pairs of models finetuned on MNLI. The solid line is the average losses, the vertical dashed lines indicate the average loss of the pure models we extrapolate ($\alpha=0$ or $\alpha=1$), and the X axis is the position (meaning the $\alpha$ and $(1-\alpha)$ values used in the extrapolation). The shade is the standard deviation across runs.
  • ...and 9 more figures