Table of Contents
Fetching ...

From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users

Sadia Sultana Chowa, Riasad Alvi, Subhey Sadi Rahman, Md Abdur Rahman, Mohaimenul Azam Khan Raiaan, Md Rafiqul Islam, Mukhtar Hussain, Sami Azam

TL;DR

The paper surveys recent advances (2023–2025) in LLMs as autonomous agents and tool users, proposing a comprehensive taxonomy across architectures, tool integration, cognition, prompting, and evaluation. It synthesizes evidence on single- and multi-agent frameworks, reasoning, planning, and memory, and analyzes how prompting, fine-tuning, and memory augmentation affect agent autonomy and grounding. A key contribution is a critical assessment of current benchmarks and 68 public datasets, highlighting gaps in verifiable reasoning, self-improvement, and personalization, and outlining ten future directions. The findings underscore that external tool access, structured memory, and hybrid prompting/fine-tuning strategies are central to scalable, safe, and effective agent systems with broad domain impact (healthcare, biology, engineering, robotics). Overall, the review provides a foundation for advancing robust, interpretable, and human-centered LLM-based agents.

Abstract

The pursuit of human-level artificial intelligence (AI) has significantly advanced the development of autonomous agents and Large Language Models (LLMs). LLMs are now widely utilized as decision-making agents for their ability to interpret instructions, manage sequential tasks, and adapt through feedback. This review examines recent developments in employing LLMs as autonomous agents and tool users and comprises seven research questions. We only used the papers published between 2023 and 2025 in conferences of the A* and A rank and Q1 journals. A structured analysis of the LLM agents' architectural design principles, dividing their applications into single-agent and multi-agent systems, and strategies for integrating external tools is presented. In addition, the cognitive mechanisms of LLM, including reasoning, planning, and memory, and the impact of prompting methods and fine-tuning procedures on agent performance are also investigated. Furthermore, we evaluated current benchmarks and assessment protocols and have provided an analysis of 68 publicly available datasets to assess the performance of LLM-based agents in various tasks. In conducting this review, we have identified critical findings on verifiable reasoning of LLMs, the capacity for self-improvement, and the personalization of LLM-based agents. Finally, we have discussed ten future research directions to overcome these gaps.

From Language to Action: A Review of Large Language Models as Autonomous Agents and Tool Users

TL;DR

The paper surveys recent advances (2023–2025) in LLMs as autonomous agents and tool users, proposing a comprehensive taxonomy across architectures, tool integration, cognition, prompting, and evaluation. It synthesizes evidence on single- and multi-agent frameworks, reasoning, planning, and memory, and analyzes how prompting, fine-tuning, and memory augmentation affect agent autonomy and grounding. A key contribution is a critical assessment of current benchmarks and 68 public datasets, highlighting gaps in verifiable reasoning, self-improvement, and personalization, and outlining ten future directions. The findings underscore that external tool access, structured memory, and hybrid prompting/fine-tuning strategies are central to scalable, safe, and effective agent systems with broad domain impact (healthcare, biology, engineering, robotics). Overall, the review provides a foundation for advancing robust, interpretable, and human-centered LLM-based agents.

Abstract

The pursuit of human-level artificial intelligence (AI) has significantly advanced the development of autonomous agents and Large Language Models (LLMs). LLMs are now widely utilized as decision-making agents for their ability to interpret instructions, manage sequential tasks, and adapt through feedback. This review examines recent developments in employing LLMs as autonomous agents and tool users and comprises seven research questions. We only used the papers published between 2023 and 2025 in conferences of the A* and A rank and Q1 journals. A structured analysis of the LLM agents' architectural design principles, dividing their applications into single-agent and multi-agent systems, and strategies for integrating external tools is presented. In addition, the cognitive mechanisms of LLM, including reasoning, planning, and memory, and the impact of prompting methods and fine-tuning procedures on agent performance are also investigated. Furthermore, we evaluated current benchmarks and assessment protocols and have provided an analysis of 68 publicly available datasets to assess the performance of LLM-based agents in various tasks. In conducting this review, we have identified critical findings on verifiable reasoning of LLMs, the capacity for self-improvement, and the personalization of LLM-based agents. Finally, we have discussed ten future research directions to overcome these gaps.

Paper Structure

This paper contains 47 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: An overview of the taxonomy used in this review.
  • Figure 2: Inclusion and exclusion criteria for article selection
  • Figure 3: (A) Flow diagram illustrating the distribution of selected articles across conferences and journals. (B) Bar chart showing the monthly publication trends from 2023 to 2025
  • Figure 4: An illustration of the LOMAR framework
  • Figure 5: A general overview of a multi-agent LLM system. Here, three agents operate within a multimodal environment where they act, generate results, and exchange feedback. Each agent is equipped with internal modules (brain), including memory, reasoning, and planning, that guide its behavior. Through collaborative communication, agents perceive the environment, coordinate strategies, and ultimately take action.
  • ...and 2 more figures