Table of Contents
Fetching ...

Aligned at the Start: Conceptual Groupings in LLM Embeddings

Mehrdad Khatir, Sanchit Kabra, Chandan K. Reddy

TL;DR

The paper investigates how base input embeddings in transformer-based LLMs encode conceptual structure prior to contextual processing, using a pipeline that combines a fuzzy graph construction over $k$-NN embeddings with Louvain community detection to extract hierarchical concept clusters. It demonstrates significant human-aligned categorization in the embedding space, notable intra-cluster organization including a topological ordering of numbers, and moderate to high alignment of concepts across diverse models and architectures. A bias-mitigation case study shows that targeted cluster modification via embedding engineering can reduce ethnicity bias while preserving task performance, highlighting practical implications for fairness and robustness. These findings offer a path toward interpretable, manipulable embedding-level representations and point to embedding engineering as a viable tool for safety and reliability in LLM applications.

Abstract

This paper shifts focus to the often-overlooked input embeddings - the initial representations fed into transformer blocks. Using fuzzy graph, k-nearest neighbor (k-NN), and community detection, we analyze embeddings from diverse LLMs, finding significant categorical community structure aligned with predefined concepts and categories aligned with humans. We observe these groupings exhibit within-cluster organization (such as hierarchies, topological ordering, etc.), hypothesizing a fundamental structure that precedes contextual processing. To further investigate the conceptual nature of these groupings, we explore cross-model alignments across different LLM categories within their input embeddings, observing a medium to high degree of alignment. Furthermore, provide evidence that manipulating these groupings can play a functional role in mitigating ethnicity bias in LLM tasks.

Aligned at the Start: Conceptual Groupings in LLM Embeddings

TL;DR

The paper investigates how base input embeddings in transformer-based LLMs encode conceptual structure prior to contextual processing, using a pipeline that combines a fuzzy graph construction over -NN embeddings with Louvain community detection to extract hierarchical concept clusters. It demonstrates significant human-aligned categorization in the embedding space, notable intra-cluster organization including a topological ordering of numbers, and moderate to high alignment of concepts across diverse models and architectures. A bias-mitigation case study shows that targeted cluster modification via embedding engineering can reduce ethnicity bias while preserving task performance, highlighting practical implications for fairness and robustness. These findings offer a path toward interpretable, manipulable embedding-level representations and point to embedding engineering as a viable tool for safety and reliability in LLM applications.

Abstract

This paper shifts focus to the often-overlooked input embeddings - the initial representations fed into transformer blocks. Using fuzzy graph, k-nearest neighbor (k-NN), and community detection, we analyze embeddings from diverse LLMs, finding significant categorical community structure aligned with predefined concepts and categories aligned with humans. We observe these groupings exhibit within-cluster organization (such as hierarchies, topological ordering, etc.), hypothesizing a fundamental structure that precedes contextual processing. To further investigate the conceptual nature of these groupings, we explore cross-model alignments across different LLM categories within their input embeddings, observing a medium to high degree of alignment. Furthermore, provide evidence that manipulating these groupings can play a functional role in mitigating ethnicity bias in LLM tasks.
Paper Structure (33 sections, 1 theorem, 7 equations, 5 figures, 13 tables, 2 algorithms)

This paper contains 33 sections, 1 theorem, 7 equations, 5 figures, 13 tables, 2 algorithms.

Key Result

Lemma 1

Local Ordering on a Manifold: Let M be a manifold and let d(a, b) denote the distance between points a and b on M. For a given positive integer $k$, we say that a point x on M is locally ordered if and only if: where $top_k(a)$ denotes the set of k-nearest neighbors of point a on M.

Figures (5)

  • Figure 1: The identified name and location communities for different k granularity for Albert model. At the left-side the average precision score for the extracted graph within each granularity is given. For the more detailed tables, see \ref{['app:albert']} (Note that the results for other models are also available in appendix \ref{['app:alignment']}).
  • Figure 2: Visualization of the identified name and location communities of size larger than 10 entities. UMAP projection along with Seaborn Waskom2021 is used for the visualization.
  • Figure 3: Simplified steps on how external information is understood and retained. Upon understanding a newly encountered word/entity, it is typically stored in the semantic memory. The existence of semantic memory (on the left) allows the previously encountered words/entities to have a form of meaning even without requiring an external context. The scatter box on the right is the community (primarily associated with moving creatures) we extracted from the Albert model lan2019albert.
  • Figure 4: Visualization of the social structure cluster and its associated identified sub-clusters.
  • Figure 5: Visualization of the hierarchical Communities from Albert. The green blocks show the clusters that being evaluated and discussed in this paper.

Theorems & Definitions (1)

  • Lemma 1