Table of Contents
Fetching ...

Less is More: Local Intrinsic Dimensions of Contextual Language Models

Benjamin Matthias Ruppik, Julius von Rohrscheidt, Carel van Niekerk, Michael Heck, Renato Vukovic, Shutong Feng, Hsien-chin Lin, Nurul Lubis, Bastian Rieck, Marcus Zibrowius, Milica Gašić

TL;DR

This work introduces a geometric lens for understanding large language model training by quantifying the local intrinsic dimension (LID) of contextual token embeddings via a localized TwoNN estimator. By analyzing token-space geometry across datasets and training stages, the authors show that fine-tuning induces dataset-specific, local compression in the embedding space, while out-of-distribution data remain largely unchanged; this enables unsupervised monitoring of generalization, grokking, exhaustion of training capabilities, and overfitting. The key contributions include demonstrating that mean LID reductions forecast performance improvements and grokking onset, and that LID stabilization aligns with training convergence, offering a practical diagnostic that does not rely on labeled validation data. The findings provide a foundation for geometry-guided model configuration and training interventions, with potential applications to LoRA-rank tuning and other post-training phases, and motivate future work on differentiable approximations and connections between token semantics and local dimensionality.

Abstract

Understanding the internal mechanisms of large language models (LLMs) remains a challenging and complex endeavor. Even fundamental questions, such as how fine-tuning affects model behavior, often require extensive empirical evaluation. In this paper, we introduce a novel perspective based on the geometric properties of contextual latent embeddings to study the effects of training and fine-tuning. To that end, we measure the local dimensions of a contextual language model's latent space and analyze their shifts during training and fine-tuning. We show that the local dimensions provide insights into the model's training dynamics and generalization ability. Specifically, the mean of the local dimensions predicts when the model's training capabilities are exhausted, as exemplified in a dialogue state tracking task, overfitting, as demonstrated in an emotion recognition task, and grokking, as illustrated with an arithmetic task. Furthermore, our experiments suggest a practical heuristic: reductions in the mean local dimension tend to accompany and predict subsequent performance gains. Through this exploration, we aim to provide practitioners with a deeper understanding of the implications of fine-tuning on embedding spaces, facilitating informed decisions when configuring models for specific applications. The results of this work contribute to the ongoing discourse on the interpretability, adaptability, and generalizability of LLMs by bridging the gap between intrinsic model mechanisms and geometric properties in the respective embeddings.

Less is More: Local Intrinsic Dimensions of Contextual Language Models

TL;DR

This work introduces a geometric lens for understanding large language model training by quantifying the local intrinsic dimension (LID) of contextual token embeddings via a localized TwoNN estimator. By analyzing token-space geometry across datasets and training stages, the authors show that fine-tuning induces dataset-specific, local compression in the embedding space, while out-of-distribution data remain largely unchanged; this enables unsupervised monitoring of generalization, grokking, exhaustion of training capabilities, and overfitting. The key contributions include demonstrating that mean LID reductions forecast performance improvements and grokking onset, and that LID stabilization aligns with training convergence, offering a practical diagnostic that does not rely on labeled validation data. The findings provide a foundation for geometry-guided model configuration and training interventions, with potential applications to LoRA-rank tuning and other post-training phases, and motivate future work on differentiable approximations and connections between token semantics and local dimensionality.

Abstract

Understanding the internal mechanisms of large language models (LLMs) remains a challenging and complex endeavor. Even fundamental questions, such as how fine-tuning affects model behavior, often require extensive empirical evaluation. In this paper, we introduce a novel perspective based on the geometric properties of contextual latent embeddings to study the effects of training and fine-tuning. To that end, we measure the local dimensions of a contextual language model's latent space and analyze their shifts during training and fine-tuning. We show that the local dimensions provide insights into the model's training dynamics and generalization ability. Specifically, the mean of the local dimensions predicts when the model's training capabilities are exhausted, as exemplified in a dialogue state tracking task, overfitting, as demonstrated in an emotion recognition task, and grokking, as illustrated with an arithmetic task. Furthermore, our experiments suggest a practical heuristic: reductions in the mean local dimension tend to accompany and predict subsequent performance gains. Through this exploration, we aim to provide practitioners with a deeper understanding of the implications of fine-tuning on embedding spaces, facilitating informed decisions when configuring models for specific applications. The results of this work contribute to the ongoing discourse on the interpretability, adaptability, and generalizability of LLMs by bridging the gap between intrinsic model mechanisms and geometric properties in the respective embeddings.

Paper Structure

This paper contains 42 sections, 3 equations, 17 figures, 1 table, 1 algorithm.

Figures (17)

  • Figure 1: Comparison of local intrinsic dimensions (LIDs) across three data modalities. The distribution of the local estimates over tokens is shown in the violin plot, together with their means and quartiles. The LID of embeddings originating from the fine‑tuning distribution (MultiWOZ) differs markedly between models, whereas the LIDs for the out‑of‑distribution corpora (Wikipedia, Reddit) are almost indistinguishable.
  • Figure 2: Training a model on addition mod $p = 197$ with different training data fraction selected from $\{0.1; 0.15; 0.2; 0.25; 0.3; 0.4; 0.5\}$. The plots show the development for 60000.0 batches, with mean and 95% confidence interval over $5$ training seeds per configuration (plots per seed are in \ref{['appendix:additional_grokking']}). The mean local estimates are computed on the training split for the parameters $N=3000$; $L=64$. Dashed lines highlight the runs where grokking did not occur.
  • Figure 3: Development of TripPy-R performance measures (model loss in green; joint goal accuracy in orange) compared with mean local dimension estimates (blue) evaluated on the training, validation, and test split of the MultiWOZ dataset. We show the mean and standard deviation of the measures evaluated at the end of each epoch over six different model training seeds.
  • Figure 4: Development of emotion recognition model performance measures (loss in green; weighted F1 in orange; macro F1 in red) compared with mean local dimension estimates (blue) evaluated on the training, validation, and test split of the EmoWOZ dataset. We show the mean and standard deviation of the measures evaluated at the end of each epoch over four different training seeds.
  • Figure 5: Boxplots of the mean local TwoNN estimates for different sequence sample sizes $M$ ranging from 2000.0 to 16000.0. We compute the mean estimates for 5 different sequence sub-sampling seeds for the three splits of the MultiWOZ dataset. Here and in all subsequent boxplots, the average of the mean estimates is depicted as green triangles, their median is the orange line; outliers are shown as circles.
  • ...and 12 more figures