Attention's Gravitational Field:A Power-Law Interpretation of Positional Correlation
Edward Zhang
TL;DR
By decoupling positional encodings from semantic embeddings from semantic embeddings, the model architecture is optimized and the concept of the Attention Gravitational Field (AGF) is introduced, demonstrating its intrinsic consistency with learning and stability curves.
Abstract
This paper explores the underlying principles of positional relationships and encodings within Large Language Models (LLMs) and introduces the concept of the Attention Gravitational Field (AGF). By decoupling positional encodings from semantic embeddings, we optimize the model architecture and achieve superior accuracy compared to prevailing encoding methods. Furthermore, we provide an in-depth analysis of AGF, demonstrating its intrinsic consistency with learning and stability curves, as well as its empirical alignment with Newton's Law of Universal Gravitation. By offering a rigorous theoretical exploration of these phenomena, this work represents a significant step toward interpreting the Attention mechanism and unlocks new possibilities for future research in model optimization and interpretability.
