Aligned at the Start: Conceptual Groupings in LLM Embeddings

Mehrdad Khatir; Sanchit Kabra; Chandan K. Reddy

Aligned at the Start: Conceptual Groupings in LLM Embeddings

Mehrdad Khatir, Sanchit Kabra, Chandan K. Reddy

TL;DR

The paper investigates how base input embeddings in transformer-based LLMs encode conceptual structure prior to contextual processing, using a pipeline that combines a fuzzy graph construction over $k$-NN embeddings with Louvain community detection to extract hierarchical concept clusters. It demonstrates significant human-aligned categorization in the embedding space, notable intra-cluster organization including a topological ordering of numbers, and moderate to high alignment of concepts across diverse models and architectures. A bias-mitigation case study shows that targeted cluster modification via embedding engineering can reduce ethnicity bias while preserving task performance, highlighting practical implications for fairness and robustness. These findings offer a path toward interpretable, manipulable embedding-level representations and point to embedding engineering as a viable tool for safety and reliability in LLM applications.

Abstract

This paper shifts focus to the often-overlooked input embeddings - the initial representations fed into transformer blocks. Using fuzzy graph, k-nearest neighbor (k-NN), and community detection, we analyze embeddings from diverse LLMs, finding significant categorical community structure aligned with predefined concepts and categories aligned with humans. We observe these groupings exhibit within-cluster organization (such as hierarchies, topological ordering, etc.), hypothesizing a fundamental structure that precedes contextual processing. To further investigate the conceptual nature of these groupings, we explore cross-model alignments across different LLM categories within their input embeddings, observing a medium to high degree of alignment. Furthermore, provide evidence that manipulating these groupings can play a functional role in mitigating ethnicity bias in LLM tasks.

Aligned at the Start: Conceptual Groupings in LLM Embeddings

TL;DR

The paper investigates how base input embeddings in transformer-based LLMs encode conceptual structure prior to contextual processing, using a pipeline that combines a fuzzy graph construction over

-NN embeddings with Louvain community detection to extract hierarchical concept clusters. It demonstrates significant human-aligned categorization in the embedding space, notable intra-cluster organization including a topological ordering of numbers, and moderate to high alignment of concepts across diverse models and architectures. A bias-mitigation case study shows that targeted cluster modification via embedding engineering can reduce ethnicity bias while preserving task performance, highlighting practical implications for fairness and robustness. These findings offer a path toward interpretable, manipulable embedding-level representations and point to embedding engineering as a viable tool for safety and reliability in LLM applications.

Abstract

Paper Structure (33 sections, 1 theorem, 7 equations, 5 figures, 13 tables, 2 algorithms)

This paper contains 33 sections, 1 theorem, 7 equations, 5 figures, 13 tables, 2 algorithms.

Introduction
Preliminaries
Static, Contextual and Base Embeddings
Previous Works on Embedding Interpretability
Concept Extraction
Graph Construction
Louvain Community Detection
Concept Extraction Algorithm
Evaluation: Alignment with External Knowledge
Named Entities
Symbols-Numbers
LLM-LLM alignment
Bias Mitigation: Case Study of Cluster Modification
Conclusion
Limitations and Risks
...and 18 more sections

Key Result

Lemma 1

Local Ordering on a Manifold: Let M be a manifold and let d(a, b) denote the distance between points a and b on M. For a given positive integer $k$, we say that a point x on M is locally ordered if and only if: where $top_k(a)$ denotes the set of k-nearest neighbors of point a on M.

Figures (5)

Figure 1: The identified name and location communities for different k granularity for Albert model. At the left-side the average precision score for the extracted graph within each granularity is given. For the more detailed tables, see \ref{['app:albert']} (Note that the results for other models are also available in appendix \ref{['app:alignment']}).
Figure 2: Visualization of the identified name and location communities of size larger than 10 entities. UMAP projection along with Seaborn Waskom2021 is used for the visualization.
Figure 3: Simplified steps on how external information is understood and retained. Upon understanding a newly encountered word/entity, it is typically stored in the semantic memory. The existence of semantic memory (on the left) allows the previously encountered words/entities to have a form of meaning even without requiring an external context. The scatter box on the right is the community (primarily associated with moving creatures) we extracted from the Albert model lan2019albert.
Figure 4: Visualization of the social structure cluster and its associated identified sub-clusters.
Figure 5: Visualization of the hierarchical Communities from Albert. The green blocks show the clusters that being evaluated and discussed in this paper.

Theorems & Definitions (1)

Lemma 1

Aligned at the Start: Conceptual Groupings in LLM Embeddings

TL;DR

Abstract

Aligned at the Start: Conceptual Groupings in LLM Embeddings

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (1)