Table of Contents
Fetching ...

Attention's Gravitational Field:A Power-Law Interpretation of Positional Correlation

Edward Zhang

TL;DR

By decoupling positional encodings from semantic embeddings from semantic embeddings, the model architecture is optimized and the concept of the Attention Gravitational Field (AGF) is introduced, demonstrating its intrinsic consistency with learning and stability curves.

Abstract

This paper explores the underlying principles of positional relationships and encodings within Large Language Models (LLMs) and introduces the concept of the Attention Gravitational Field (AGF). By decoupling positional encodings from semantic embeddings, we optimize the model architecture and achieve superior accuracy compared to prevailing encoding methods. Furthermore, we provide an in-depth analysis of AGF, demonstrating its intrinsic consistency with learning and stability curves, as well as its empirical alignment with Newton's Law of Universal Gravitation. By offering a rigorous theoretical exploration of these phenomena, this work represents a significant step toward interpreting the Attention mechanism and unlocks new possibilities for future research in model optimization and interpretability.

Attention's Gravitational Field:A Power-Law Interpretation of Positional Correlation

TL;DR

By decoupling positional encodings from semantic embeddings from semantic embeddings, the model architecture is optimized and the concept of the Attention Gravitational Field (AGF) is introduced, demonstrating its intrinsic consistency with learning and stability curves.

Abstract

This paper explores the underlying principles of positional relationships and encodings within Large Language Models (LLMs) and introduces the concept of the Attention Gravitational Field (AGF). By decoupling positional encodings from semantic embeddings, we optimize the model architecture and achieve superior accuracy compared to prevailing encoding methods. Furthermore, we provide an in-depth analysis of AGF, demonstrating its intrinsic consistency with learning and stability curves, as well as its empirical alignment with Newton's Law of Universal Gravitation. By offering a rigorous theoretical exploration of these phenomena, this work represents a significant step toward interpreting the Attention mechanism and unlocks new possibilities for future research in model optimization and interpretability.
Paper Structure (17 sections, 19 equations, 8 figures, 6 tables)

This paper contains 17 sections, 19 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Decomposition
  • Figure 2: Part-of-Speech VS Attention
  • Figure 3: Frequency Distribution of Words Following 'beautiful'
  • Figure 4: Power vs Exp
  • Figure 5: Learning Curve
  • ...and 3 more figures