Table of Contents
Fetching ...

What's in an embedding? Would a rose by any embedding smell as sweet?

Venkat Venkatasubramanian

TL;DR

The paper investigates whether LLMs truly understand language and argues that their internal representations are geometry-like rather than purely algebraic. It proposes a paradigm shift toward Large Knowledge Models that integrate geometric embeddings with symbolic, algebraic knowledge to enable deep reasoning, explanations, and safer generalization. By reviewing internal representations, symbolic AI history, and neuro-symbolic examples like AlphaGeometry and RLHF-based guidance, it highlights a path to hybrid systems that combine data-driven insight with first-principles reasoning. The central claim is that LKMs can extend AI capabilities to science and engineering domains, reducing data needs while enhancing interpretability and reliability. This perspective advocates a fundamental rethinking from LLM-centric AI to integrated knowledge-centric architectures.

Abstract

Large Language Models (LLMs) are often criticized for lacking true "understanding" and the ability to "reason" with their knowledge, being seen merely as autocomplete systems. We believe that this assessment might be missing a nuanced insight. We suggest that LLMs do develop a kind of empirical "understanding" that is "geometry"-like, which seems adequate for a range of applications in NLP, computer vision, coding assistance, etc. However, this "geometric" understanding, built from incomplete and noisy data, makes them unreliable, difficult to generalize, and lacking in inference capabilities and explanations, similar to the challenges faced by heuristics-based expert systems decades ago. To overcome these limitations, we suggest that LLMs should be integrated with an "algebraic" representation of knowledge that includes symbolic AI elements used in expert systems. This integration aims to create large knowledge models (LKMs) that not only possess "deep" knowledge grounded in first principles, but also have the ability to reason and explain, mimicking human expert capabilities. To harness the full potential of generative AI safely and effectively, a paradigm shift is needed from LLM to more comprehensive LKM.

What's in an embedding? Would a rose by any embedding smell as sweet?

TL;DR

The paper investigates whether LLMs truly understand language and argues that their internal representations are geometry-like rather than purely algebraic. It proposes a paradigm shift toward Large Knowledge Models that integrate geometric embeddings with symbolic, algebraic knowledge to enable deep reasoning, explanations, and safer generalization. By reviewing internal representations, symbolic AI history, and neuro-symbolic examples like AlphaGeometry and RLHF-based guidance, it highlights a path to hybrid systems that combine data-driven insight with first-principles reasoning. The central claim is that LKMs can extend AI capabilities to science and engineering domains, reducing data needs while enhancing interpretability and reliability. This perspective advocates a fundamental rethinking from LLM-centric AI to integrated knowledge-centric architectures.

Abstract

Large Language Models (LLMs) are often criticized for lacking true "understanding" and the ability to "reason" with their knowledge, being seen merely as autocomplete systems. We believe that this assessment might be missing a nuanced insight. We suggest that LLMs do develop a kind of empirical "understanding" that is "geometry"-like, which seems adequate for a range of applications in NLP, computer vision, coding assistance, etc. However, this "geometric" understanding, built from incomplete and noisy data, makes them unreliable, difficult to generalize, and lacking in inference capabilities and explanations, similar to the challenges faced by heuristics-based expert systems decades ago. To overcome these limitations, we suggest that LLMs should be integrated with an "algebraic" representation of knowledge that includes symbolic AI elements used in expert systems. This integration aims to create large knowledge models (LKMs) that not only possess "deep" knowledge grounded in first principles, but also have the ability to reason and explain, mimicking human expert capabilities. To harness the full potential of generative AI safely and effectively, a paradigm shift is needed from LLM to more comprehensive LKM.
Paper Structure (5 sections, 2 equations, 5 figures, 1 table)

This paper contains 5 sections, 2 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Geometric representation of a circle
  • Figure 2: Points on the circle
  • Figure 3: Intersecting circles
  • Figure 4: Geometric representation of a circle: (a) 20 points with noise (b) 100 points with noise (c) 1000 points with noise
  • Figure 5: "Geometric" representation of features in Claude 3 Sonnet templeton2024scaling