Table of Contents
Fetching ...

DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models

Zeping Min, Xinshang Wang

TL;DR

This paper tackles the challenge of interpreting Large Language Models by focusing on weight-matrix similarity rather than representations. It introduces DOCS, the Distribution of Cosine Similarity, which computes a max-cosine alignment between column vectors of weight matrices, fits Gumbel distributions to the resulting maxima, and defines S_DOCS = (u_X + u_Y)/2 to quantify similarity. Theoretical results show that DOCS discriminates between orthogonal matrices, a key limitation of prior indices, and experiments across open-source LLMs reveal that adjacent layers are often weight-similar and that coherent layer clusters exist, with base vs instruction-tuned models largely preserving underlying weights. The work offers implications for architecture design, sparsity, and knowledge distillation, providing a practical tool for deeper, more interpretable analysis of LLMs and guiding future efficiency-focused developments.

Abstract

We introduce a novel index, the Distribution of Cosine Similarity (DOCS), for quantitatively assessing the similarity between weight matrices in Large Language Models (LLMs), aiming to facilitate the analysis of their complex architectures. Leveraging DOCS, our analysis uncovers intriguing patterns in the latest open-source LLMs: adjacent layers frequently exhibit high weight similarity and tend to form clusters, suggesting depth-wise functional specialization. Additionally, we prove that DOCS is theoretically effective in quantifying similarity for orthogonal matrices, a crucial aspect given the prevalence of orthogonal initializations in LLMs. This research contributes to a deeper understanding of LLM architecture and behavior, offering tools with potential implications for developing more efficient and interpretable models.

DOCS: Quantifying Weight Similarity for Deeper Insights into Large Language Models

TL;DR

This paper tackles the challenge of interpreting Large Language Models by focusing on weight-matrix similarity rather than representations. It introduces DOCS, the Distribution of Cosine Similarity, which computes a max-cosine alignment between column vectors of weight matrices, fits Gumbel distributions to the resulting maxima, and defines S_DOCS = (u_X + u_Y)/2 to quantify similarity. Theoretical results show that DOCS discriminates between orthogonal matrices, a key limitation of prior indices, and experiments across open-source LLMs reveal that adjacent layers are often weight-similar and that coherent layer clusters exist, with base vs instruction-tuned models largely preserving underlying weights. The work offers implications for architecture design, sparsity, and knowledge distillation, providing a practical tool for deeper, more interpretable analysis of LLMs and guiding future efficiency-focused developments.

Abstract

We introduce a novel index, the Distribution of Cosine Similarity (DOCS), for quantitatively assessing the similarity between weight matrices in Large Language Models (LLMs), aiming to facilitate the analysis of their complex architectures. Leveraging DOCS, our analysis uncovers intriguing patterns in the latest open-source LLMs: adjacent layers frequently exhibit high weight similarity and tend to form clusters, suggesting depth-wise functional specialization. Additionally, we prove that DOCS is theoretically effective in quantifying similarity for orthogonal matrices, a crucial aspect given the prevalence of orthogonal initializations in LLMs. This research contributes to a deeper understanding of LLM architecture and behavior, offering tools with potential implications for developing more efficient and interpretable models.

Paper Structure

This paper contains 35 sections, 13 theorems, 91 equations, 17 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

For $n \geq 2$, there exist $m = \Omega(n)$ and column-orthogonal matrices $X, Y \in \mathbb{R}^{n \times m}$ such that their Frobenius norm difference and DOCS similarity satisfy:

Figures (17)

  • Figure 1: Comparison of similarity indices applied to representation similarities ((a) and (b)) and weight similarities ((c) and (d)) across different layers of Llama 3.1-8B-Instruct dubey2024llama.
  • Figure 2: Comparison of similarity indices on the MLP-Up layers of Meta-Llama-3.1-8B-Instruct.
  • Figure 3: The top row displays heatmaps of DOCS scores between layers for different weight matrices in gemma-2-27b-it. The bottom row illustrates the relationship between DOCS scores and the distance between layers.
  • Figure 4: Analysis of $W_v$ matrices across various LLMs. Top row: Heatmaps visualize DOCS similarity scores between transformer layers. Bottom row: Average DOCS scores are computed for diagonal blocks (sizes 3x3 to 7x7) within each heatmap.
  • Figure 5: Average DOCS scores for diagonal blocks of varying sizes (3x3 to 7x7) within heatmaps representing different weight matrices in Llama-3.1-70B.
  • ...and 12 more figures

Theorems & Definitions (28)

  • Definition 1: Constant on Orthogonal Matrices
  • Definition 2: Dimension-Dependent on Orthogonal Matrices
  • Definition 3: Discriminative on Orthogonal Matrices
  • Theorem 1
  • Lemma 1: Permutation Transformation Invariance
  • proof
  • Lemma 2: Symmetry
  • proof
  • Lemma 3: Isotropic Scaling Invariance
  • proof
  • ...and 18 more