Table of Contents
Fetching ...

Fundamentals of Building Autonomous LLM Agents

Victor de Lamo Castrillo, Habtom Kahsay Gidey, Alexander Lenz, Alois Knoll

TL;DR

The paper surveys how to construct autonomous LLM agents by integrating perception, memory, reasoning, planning, and execution into modular architectures. It highlights methods such as Chain-of-Thought and Tree-of-Thought, along with DPPM-style decomposition and reflection, and emphasizes the role of expert ensembles in scaling reasoning. It reviews perception modalities (text, multimodal, structured data, and tool-based), memory strategies (RAG and long-term storage), and multimodal execution (UI automation, code, and robotics), noting challenges like GUI grounding, latency, and context-window limits. The work underscores the practical significance of modular, memory-aware LLM agents for complex, real-world automation and decision-making, while indicating avenues for future enhancement, including one-shot learning and human-in-the-loop setups.

Abstract

This paper reviews the architecture and implementation methods of agents powered by large language models (LLMs). Motivated by the limitations of traditional LLMs in real-world tasks, the research aims to explore patterns to develop "agentic" LLMs that can automate complex tasks and bridge the performance gap with human capabilities. Key components include a perception system that converts environmental percepts into meaningful representations; a reasoning system that formulates plans, adapts to feedback, and evaluates actions through different techniques like Chain-of-Thought and Tree-of-Thought; a memory system that retains knowledge through both short-term and long-term mechanisms; and an execution system that translates internal decisions into concrete actions. This paper shows how integrating these systems leads to more capable and generalized software bots that mimic human cognitive processes for autonomous and intelligent behavior.

Fundamentals of Building Autonomous LLM Agents

TL;DR

The paper surveys how to construct autonomous LLM agents by integrating perception, memory, reasoning, planning, and execution into modular architectures. It highlights methods such as Chain-of-Thought and Tree-of-Thought, along with DPPM-style decomposition and reflection, and emphasizes the role of expert ensembles in scaling reasoning. It reviews perception modalities (text, multimodal, structured data, and tool-based), memory strategies (RAG and long-term storage), and multimodal execution (UI automation, code, and robotics), noting challenges like GUI grounding, latency, and context-window limits. The work underscores the practical significance of modular, memory-aware LLM agents for complex, real-world automation and decision-making, while indicating avenues for future enhancement, including one-shot learning and human-in-the-loop setups.

Abstract

This paper reviews the architecture and implementation methods of agents powered by large language models (LLMs). Motivated by the limitations of traditional LLMs in real-world tasks, the research aims to explore patterns to develop "agentic" LLMs that can automate complex tasks and bridge the performance gap with human capabilities. Key components include a perception system that converts environmental percepts into meaningful representations; a reasoning system that formulates plans, adapts to feedback, and evaluates actions through different techniques like Chain-of-Thought and Tree-of-Thought; a memory system that retains knowledge through both short-term and long-term mechanisms; and an execution system that translates internal decisions into concrete actions. This paper shows how integrating these systems leads to more capable and generalized software bots that mimic human cognitive processes for autonomous and intelligent behavior.

Paper Structure

This paper contains 49 sections, 5 figures.

Figures (5)

  • Figure 1: Key Components of an Agent's LLM Architecture
  • Figure 2: Architecture of Multimodal Large Language Models (MM-LLMs) for Understanding and Generation zhang2024mmllmsrecentadvancesmultimodal
  • Figure 5: Comparison of different types of planning frameworks, including sequential decomposition-planning, interleaved decomposition-planning, and DPPM lu2025decomposeplanparallel.
  • Figure 7: Flowchart of a Reasoning System Using Decompose, Plan, and Merge (DPPM) approach with a reflection system
  • Figure 8: Example of the communication between agents in a multi-agent system