Table of Contents
Fetching ...

Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries

Neil He, Jiahong Liu, Buze Zhang, Ngoc Bui, Ali Maatouk, Menglin Yang, Irwin King, Melanie Weber, Rex Ying

TL;DR

This work argues that Euclidean geometry is insufficient for scaling next-generation foundation models on real-world data that exhibit non-Euclidean structures such as hierarchies and cycles. It advocates for curvature-aware, non-Euclidean foundations—including hyperbolic, spherical, and mixed-curvature geometries—through three development paths: fine-tuning existing Euclidean models, pretraining new non-Euclidean models, and hybrid architectures that blend geometries. The authors provide theoretical insights (distortion-dimension trade-offs and Markov convexity) and empirical evidence showing that non-Euclidean embeddings achieve lower distortion and better capture hierarchical and multi-modal structures, potentially improving representational efficiency and transfer. If adopted, curvature-aware foundation models could enhance scalability and adaptability while mitigating issues like hallucinations, enabling more efficient cross-modal learning and more faithful representations of complex data geometry.

Abstract

In the era of foundation models and Large Language Models (LLMs), Euclidean space has been the de facto geometric setting for machine learning architectures. However, recent literature has demonstrated that this choice comes with fundamental limitations. At a large scale, real-world data often exhibits inherently non-Euclidean structures, such as multi-way relationships, hierarchies, symmetries, and non-isotropic scaling, in a variety of domains, such as languages, vision, and the natural sciences. It is challenging to effectively capture these structures within the constraints of Euclidean spaces. This position paper argues that moving beyond Euclidean geometry is not merely an optional enhancement but a necessity to maintain the scaling law for the next-generation of foundation models. By adopting these geometries, foundation models could more efficiently leverage the aforementioned structures. Task-aware adaptability that dynamically reconfigures embeddings to match the geometry of downstream applications could further enhance efficiency and expressivity. Our position is supported by a series of theoretical and empirical investigations of prevalent foundation models. Finally, we outline a roadmap for integrating non-Euclidean geometries into foundation models, including strategies for building geometric foundation models via fine-tuning, training from scratch, and hybrid approaches.

Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries

TL;DR

This work argues that Euclidean geometry is insufficient for scaling next-generation foundation models on real-world data that exhibit non-Euclidean structures such as hierarchies and cycles. It advocates for curvature-aware, non-Euclidean foundations—including hyperbolic, spherical, and mixed-curvature geometries—through three development paths: fine-tuning existing Euclidean models, pretraining new non-Euclidean models, and hybrid architectures that blend geometries. The authors provide theoretical insights (distortion-dimension trade-offs and Markov convexity) and empirical evidence showing that non-Euclidean embeddings achieve lower distortion and better capture hierarchical and multi-modal structures, potentially improving representational efficiency and transfer. If adopted, curvature-aware foundation models could enhance scalability and adaptability while mitigating issues like hallucinations, enabling more efficient cross-modal learning and more faithful representations of complex data geometry.

Abstract

In the era of foundation models and Large Language Models (LLMs), Euclidean space has been the de facto geometric setting for machine learning architectures. However, recent literature has demonstrated that this choice comes with fundamental limitations. At a large scale, real-world data often exhibits inherently non-Euclidean structures, such as multi-way relationships, hierarchies, symmetries, and non-isotropic scaling, in a variety of domains, such as languages, vision, and the natural sciences. It is challenging to effectively capture these structures within the constraints of Euclidean spaces. This position paper argues that moving beyond Euclidean geometry is not merely an optional enhancement but a necessity to maintain the scaling law for the next-generation of foundation models. By adopting these geometries, foundation models could more efficiently leverage the aforementioned structures. Task-aware adaptability that dynamically reconfigures embeddings to match the geometry of downstream applications could further enhance efficiency and expressivity. Our position is supported by a series of theoretical and empirical investigations of prevalent foundation models. Finally, we outline a roadmap for integrating non-Euclidean geometries into foundation models, including strategies for building geometric foundation models via fine-tuning, training from scratch, and hybrid approaches.

Paper Structure

This paper contains 26 sections, 4 theorems, 3 equations, 6 figures, 4 tables.

Key Result

Theorem 3.1

(matousek2002lectures) Let $X$ be an $n$-point metric space with uniform distance $1$, i.e., an unweighted complete graph with $n$ nodes. For $\epsilon > 0$, the minimal $d$ such that $X$ can be embedded into $\mathbb{R}^d$ with distortion $(1+\epsilon)$ is $d = \Omega\left(\frac{\log(n)}{\epsilon^2

Figures (6)

  • Figure 1: Manifolds and their corresponding graph structures or underlying relationships, which represent different types of token relationships: hierarchical (left), uniform (middle), and cyclical (right) dependencies.
  • Figure 2: Token frequency v.s. token count (left 2) and token norm vs token count (right 2) for LLaMa3.1-8B and LLaMaGen. The datasets are chosen to be within the training corpus. The token-frequency figures show the scale-free properties of the token inputs. The token norm figures reflect this property for learned token embeddings to some extent, with token count increasing exponentially for high-normed tokens at the left tail. However, the Euclidean embeddings still do not fully capture this property and deviate from it at the right tail. More statistics are shown in \ref{['additional_statistics']}.
  • Figure 3: Distortion for embedding a Tree with 96 nodes for varying dimensionality (log scale). Non-Euclidean geometry achieves smaller distortion with significantly fewer dimensions and has better scaling.
  • Figure 4: Roadmap for integrating non-Euclidean geometries into foundation models, includes (a) fine-tuning existing Euclidean foundation models, (b) pretraining from scratch, and (c) hybrid architectures. Four strategies are shown in (a), labeled with circled numbers 1-4, respectively: geometric prompt tuning, geometric low-rank adaptation, geometric knowledge distillation, and geometric transfer learning. All learnable components are highlighted in red in (a) and (c).
  • Figure 5: Distortion of embedding a complete tree, cycle, and ring of tree into manifolds of different dimensions (log scale). Each graph has 96 nodes. Euclidean embeddings is shown in blue. In all cases, non-Euclidean geometry achieves significantly smaller distortion with significantly fewer dimensions. The distortion for Euclidean embeddings always plateaus, demonstrating that it is not suited for embeddings each structures regardless of its dimension.
  • ...and 1 more figures

Theorems & Definitions (7)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Lemma 3.4: balestrierocharacterizing
  • Definition A.1
  • Definition A.2
  • Definition A.3