Towards Rationality in Language and Multimodal Agents: A Survey

Bowen Jiang; Yangxinyu Xie; Xiaomeng Wang; Yuan Yuan; Zhuoqun Hao; Xinyi Bai; Weijie J. Su; Camillo J. Taylor; Tanwi Mallick

Towards Rationality in Language and Multimodal Agents: A Survey

Bowen Jiang, Yangxinyu Xie, Xiaomeng Wang, Yuan Yuan, Zhuoqun Hao, Xinyi Bai, Weijie J. Su, Camillo J. Taylor, Tanwi Mallick

TL;DR

This survey addresses how to build rational language and multimodal agents by defining four necessary axioms of rationality and examining how grounding, logical consistency, invariance, and preference orderability can be enhanced through multimodal inputs, external tools, and multi-agent collaboration. It surveys mechanisms such as retrieval-augmented generation, neuro-symbolic reasoning, system-2-like deliberation, and conformal risk controls to mitigate LLM limitations in real-world decision-making. The work highlights methods to extend working memory, enable deterministic tool use, unify cross-modal representations, and learn robust preferences, while noting evaluation gaps and proposing directions toward inherent rationality and richer multimodal multi-agent systems. It emphasizes the practical significance of rational, reliable AI in high-stakes domains and advocates coordinated research between AI scientists and cognitive scientists to advance principled, verifiable rationality in agents.

Abstract

This work discusses how to build more rational language and multimodal agents and what criteria define rationality in intelligent systems. Rationality is the quality of being guided by reason, characterized by decision-making that aligns with evidence and logical principles. It plays a crucial role in reliable problem-solving by ensuring well-grounded and consistent solutions. Despite their progress, large language models (LLMs) often fall short of rationality due to their bounded knowledge space and inconsistent outputs. In response, recent efforts have shifted toward developing multimodal and multi-agent systems, as well as integrating modules like external tools, programming codes, symbolic reasoners, utility function, and conformal risk controls rather than relying solely on a single LLM for decision-making. This paper surveys state-of-the-art advancements in language and multimodal agents, assesses their role in enhancing rationality, and outlines open challenges and future research directions. We maintain an open repository at https://github.com/bowen-upenn/Agent_Rationality.

Towards Rationality in Language and Multimodal Agents: A Survey

TL;DR

Abstract

Paper Structure (32 sections, 3 figures)

This paper contains 32 sections, 3 figures.

Introduction
Scope
Defining Rationality in Agents
Information Grounding
Logical Consistency
Invariance from Irrelevant Context
Orderability of Preference
Towards Rationality in Agents
Advancing Information Grounding
Grounding on multimodal information
Expanding working memory from external knowledge retrieval and tool utilization
Advancing Logical Consistency
Consensus from reflection and multi-agent collaboration
Consistent execution from symbolic reasoning and tool utilization
Advancing Invariance from Irrelevant Information
...and 17 more sections

Figures (3)

Figure 1: This survey identifies four necessary, though not sufficient, axioms that a rational agent should fulfill. Meanwhile, we reinterpret various research domains related to agents and agent systems through the lens of rationality, examining how their underlying algorithms contribute to each of these axioms.
Figure 2: The evolutionary tree of language and multimodal agents and agent systems related to the four key axioms of agent rationality. The axioms are listed at the bottom, while each colored arrow representing a distinct research domain. Works involving multi-modalities are highlighted in bold.
Figure 3: Overview of how language and multimodal agents promote the four axioms of rationality. (1) Top Left - Advancing Information Grounding: Multimodal inputs enhance an agent's understanding of decision contexts and expand its functionalities; External knowledge sources and tools like programming codes expand its bounded working memory. (2) Top Right – Advancing logical consistency: Multi-agent collaboration facilitates deliberate thinking that could correct errors and achieve consensus; Neuro-symbolic reasoning and tools ensure consistent, deterministic executions. (3) Bottom Left – Advancing invariance from irrelevant information: Cross-modal training unifies representations across modalities; Neuro-symbolic tools focus the agent on logical essence. (4) Bottom Right – Advancing orderability of preference: Reinforcement from AI feedback mimics humans and provides more stable preference scores; Utility functions and conformal risk control further guide the preference in rigorous frameworks.

Towards Rationality in Language and Multimodal Agents: A Survey

TL;DR

Abstract

Towards Rationality in Language and Multimodal Agents: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (3)