Language Models Don't Learn the Physical Manifestation of Language
Bruce W. Lee, JaeHyuk Lim
TL;DR
This work investigates whether language-only models can truly grasp the physical manifestation of language by introducing H-Test, a battery of visuospatial and auditory tasks. Across a range of language-only models, results cluster near random performance, suggesting strong sensory-grounding blind spots that such models do not readily bridge through scaling or in-context learning. Some multimodal systems (e.g., GPT-4o, Claude 3 Opus) show improved performance on parts of H-Test, indicating that sensory grounding or architectural differences may enable solving the tasks, though the exact mechanisms remain unclear. Grounded in the Mary’s Room analogy, the paper argues for the necessity of sensory experience or alternative architectures to achieve robust, human-like language understanding, and it highlights multiple limitations and future directions for grounding language models in perception.
Abstract
We argue that language-only models don't learn the physical manifestation of language. We present an empirical investigation of visual-auditory properties of language through a series of tasks, termed H-Test. These tasks highlight a fundamental gap between human linguistic understanding and the sensory-deprived linguistic understanding of LLMs. In support of our hypothesis, 1. deliberate reasoning (Chain-of-Thought), 2. few-shot examples, or 3. stronger LLM from the same model family (LLaMA 2 13B -> LLaMA 2 70B) has no significant effect on H-Test performance. We bring in the philosophical case of Mary, who learns about the world in a sensory-deprived environment as a useful conceptual framework to understand how language-only models learn about the world (Jackson, 1986). Our experiments show that some of the strongest proprietary LLMs stay near random chance baseline accuracy of 50%, highlighting the limitations of linguistic knowledge acquired in the absence of sensory experience. Our code and data are available at <github.com/brucewlee/h-test>.
