Position: Stop Acting Like Language Model Agents Are Normal Agents
Elija Perrier, Michael Timothy Bennett
TL;DR
The paper argues that Language Model Agents (LMAs) are not normal agents due to intrinsic LLM pathologies that undermine identifiability, continuity, persistence, and consistency. It introduces ontological identity conditions for agents and shows how statelessness, stochasticity, semantic sensitivity, and linguistic intermediation compromise these conditions, even when LMAs are augmented with memory, tools, and planning modules. The authors advocate for agentic evaluations, including mechanistic interpretability and external identity assessments, to quantify and manage the risk of agentic identity in LMAs across their life cycle. The work emphasizes a practical, measurement-driven approach to deployment, aiming to maximize utility while acknowledging and mitigating fundamental ontological limitations of LMAs. This perspective encourages rethinking deployment strategies and developing rigorous metrics to ensure robustness, trust, and safety in real-world applications.
Abstract
Language Model Agents (LMAs) are increasingly treated as capable of autonomously navigating interactions with humans and tools. Their design and deployment tends to presume they are normal agents capable of sustaining coherent goals, adapting across contexts and acting with a measure of intentionality. These assumptions are critical to prospective use cases in industrial, social and governmental settings. But LMAs are not normal agents. They inherit the structural problems of the large language models (LLMs) around which they are built: hallucinations, jailbreaking, misalignment and unpredictability. In this Position paper we argue LMAs should not be treated as normal agents, because doing so leads to problems that undermine their utility and trustworthiness. We enumerate pathologies of agency intrinsic to LMAs. Despite scaffolding such as external memory and tools, they remain ontologically stateless, stochastic, semantically sensitive, and linguistically intermediated. These pathologies destabilise the ontological properties of LMAs including identifiability, continuity, persistence and and consistency, problematising their claim to agency. In response, we argue LMA ontological properties should be measured before, during and after deployment so that the negative effects of pathologies can be mitigated.
