A Philosophical Introduction to Language Models - Part II: The Way Forward
Raphaël Millière, Cameron Buckner
TL;DR
This two-part work surveys philosophical questions raised by large language models (LLMs), focusing on moving beyond behavioral benchmarks to uncover internal mechanisms via interventionist methods. It presents case studies on induction heads, modular addition, and world models to show that LLMs instantiate structured, causally relevant representations, challenging the view of them as mere memorization. The paper discusses newer trends—multimodal and agent-based architectures—and their philosophical implications for grounding, consciousness, and scientific legitimacy, while advocating for openness and reproducibility. It argues for a cautious middle ground: LLMs are useful partial models of certain cognitive processes but do not yet constitute full cognitive or conscious agents, underscoring the importance of rigorous methodology and interdisciplinary collaboration. Overall, the work highlights both the promise and the limits of LLMs as tools for understanding intelligence, prompting ongoing research into their internal computations and their role in cognitive science.
Abstract
In this paper, the second of two companion pieces, we explore novel philosophical questions raised by recent progress in large language models (LLMs) that go beyond the classical debates covered in the first part. We focus particularly on issues related to interpretability, examining evidence from causal intervention methods about the nature of LLMs' internal representations and computations. We also discuss the implications of multimodal and modular extensions of LLMs, recent debates about whether such systems may meet minimal criteria for consciousness, and concerns about secrecy and reproducibility in LLM research. Finally, we discuss whether LLM-like systems may be relevant to modeling aspects of human cognition, if their architectural characteristics and learning scenario are adequately constrained.
