Table of Contents
Fetching ...

A short survey on almost orthogonal vectors in a few specific large dimensions

Rami Luisto

TL;DR

The paper addresses the problem of how many ε-almost orthogonal directions can be packed in ℝ^n, with emphasis on embedding-space dimensions used by contemporary transformers. It surveys mathematical preliminaries—volumes, spherical caps, and special functions—then relates these to bounds from the Johnson-Lindenstrauss lemma and spherical codes/sphere packing, augmented by simulations that explore generation methods based on random vectors, random projections, and energy minimization. It finds that pure volume bounds are loose in high dimensions and that JL-type bounds are asymptotic and often not tight for moderate n (e.g., n ≈ 768); in contrast, simulation-based approaches can yield substantially more near-orthogonal directions than the standard basis, with energy-based methods producing characteristic bimodal cosine distributions in lower dimensions. The results illuminate the sizable capacity of high-dimensional embedding spaces to host many almost-orthogonal directions and suggest practical and theoretical avenues for leveraging lossily stored directional structure in AI embedding spaces.

Abstract

The concept of \emph{almost orthogonal vectors}, i.e.\ vectors whose cosine similarity is close to $0$, relates to topics both in pure mathematics and in coding theory under the guises of spherical packing and spherical codes. In recent years the rise of advanced language models in AI has created new interest in this concept as the models seem to store certain concepts as almost orthogonal directions in high-dimensional spaces. In this survey we represent some ideas regarding almost orthogonal vectors through three approaches: (1) the mathematical theory of almost orthogonality, (2) some observations from the embedding spaces of language models, and (3) generation of large sets of almost orthogonal vectors by simulations.

A short survey on almost orthogonal vectors in a few specific large dimensions

TL;DR

The paper addresses the problem of how many ε-almost orthogonal directions can be packed in ℝ^n, with emphasis on embedding-space dimensions used by contemporary transformers. It surveys mathematical preliminaries—volumes, spherical caps, and special functions—then relates these to bounds from the Johnson-Lindenstrauss lemma and spherical codes/sphere packing, augmented by simulations that explore generation methods based on random vectors, random projections, and energy minimization. It finds that pure volume bounds are loose in high dimensions and that JL-type bounds are asymptotic and often not tight for moderate n (e.g., n ≈ 768); in contrast, simulation-based approaches can yield substantially more near-orthogonal directions than the standard basis, with energy-based methods producing characteristic bimodal cosine distributions in lower dimensions. The results illuminate the sizable capacity of high-dimensional embedding spaces to host many almost-orthogonal directions and suggest practical and theoretical avenues for leveraging lossily stored directional structure in AI embedding spaces.

Abstract

The concept of \emph{almost orthogonal vectors}, i.e.\ vectors whose cosine similarity is close to , relates to topics both in pure mathematics and in coding theory under the guises of spherical packing and spherical codes. In recent years the rise of advanced language models in AI has created new interest in this concept as the models seem to store certain concepts as almost orthogonal directions in high-dimensional spaces. In this survey we represent some ideas regarding almost orthogonal vectors through three approaches: (1) the mathematical theory of almost orthogonality, (2) some observations from the embedding spaces of language models, and (3) generation of large sets of almost orthogonal vectors by simulations.

Paper Structure

This paper contains 22 sections, 1 theorem, 30 equations, 17 figures, 7 tables.

Key Result

Lemma 4.1

Let $0 < \varepsilon < 1$ and let $S \colonequals \{ x_1, \ldots, x_k \} \subset \mathbb R^N$. Then for any $n > \frac{8\log(k)}{\varepsilon^2}$ there exists a linear map $L\colon \mathbb R^N \to \mathbb R^n$ such that for all $x_i, x_j \in S$.

Figures (17)

  • Figure 1: The distribution of pairwise cosine similarities of input embeddings for various models. The dimension of the embedding space is in parenthesis after the model name.
  • Figure 2: The distribution of pairwise cosine similarities of input embeddings for various models after being scaled as a probability distribution and normalized. Normal distribution is shown for comparison.
  • Figure 3: Volume comparisons of balls in different dimensions. The maximum value for the unit ball occurs at $n=5$.
  • Figure 4: An illustration on how much of the ambient space of a cube edge is part of the cube.
  • Figure 5: Example image of spherical caps.
  • ...and 12 more figures

Theorems & Definitions (6)

  • Definition 3.1
  • Definition 3.2
  • Definition 3.3
  • Definition 3.4
  • Lemma 4.1: Johnson-Lindenstrauss Lemma
  • Remark 4.2