Vectoring Languages
Joseph Chen
TL;DR
The paper addresses the challenge of reconciling philosophical theories of language with modern AI language models by proposing a vectoring framework that treats language as a high-dimensional vector space $V_L$ and uses projections to extract interpretable attribute subspaces. It formalizes definitions, contrasts vectoring with practical vectoring, and connects to existing theories (e.g., Word2Vec, Transformers) through a linear-algebra-inspired lens. It argues that meanings reside in structured projections like $W_{meaning}$ and introduces a taxonomy to map words to subspace coordinates via functions $F$. The work aims to guide future research and experimental exploration by uniting philosophy with rapidly advancing AI models to accelerate scientific progress, while acknowledging current limitations and proposing directions for agent-based updating and multi-perspective integration.
Abstract
Recent breakthroughs in large language models (LLM) have stirred up global attention, and the research has been accelerating non-stop since then. Philosophers and psychologists have also been researching the structure of language for decades, but they are having a hard time finding a theory that directly benefits from the breakthroughs of LLMs. In this article, we propose a novel structure of language that reflects well on the mechanisms behind language models and go on to show that this structure is also better at capturing the diverse nature of language compared to previous methods. An analogy of linear algebra is adapted to strengthen the basis of this perspective. We further argue about the difference between this perspective and the design philosophy for current language models. Lastly, we discuss how this perspective can lead us to research directions that may accelerate the improvements of science fastest.
