Table of Contents
Fetching ...

Autonomous Deep Agent

Amy Yu, Erik Lebedev, Lincoln Everett, Xiaoxin Chen, Terry Chen

TL;DR

Deep Agent tackles the challenge of autonomously handling complex, multi-phase tasks with dynamic dependencies in real-time contexts. It combines a Hierarchical Task DAG with a two-stage planner-executor, Autonomous API & Tool Creation, and a Prompt Tweaking Engine with Autonomous Prompt Feedback Learning to enable end-to-end task orchestration, dynamic adaptation, and self-improvement. Key contributions include recursive task decomposition via HTDAG, automatic generation of reusable APIs/tools, prompt optimization to maintain inference stability, test-time computation with validation, and continuous prompt feedback learning for self-improvement. The approach promises reduced operational costs, greater resilience to disruptions, and scalable autonomous workflow execution suitable for industry automation.

Abstract

This technical brief introduces Deep Agent, an advanced autonomous AI system designed to manage complex multi-phase tasks through a novel hierarchical task management architecture. The system's foundation is built on our Hierarchical Task DAG (HTDAG) framework, which dynamically decomposes high-level objectives into manageable sub-tasks while rigorously maintaining dependencies and execution coherence. Deep Agent advances beyond traditional agent systems through three key innovations: First, it implements a recursive two-stage planner-executor architecture that enables continuous task refinement and adaptation as circumstances change. Second, it features an Autonomous API & Tool Creation (AATC) system that automatically generates reusable components from UI interactions, substantially reducing operational costs for similar tasks. Third, it incorporates Prompt Tweaking Engine and Autonomous Prompt Feedback Learning components that optimize Large Language Model prompts for specific scenarios, enhancing both inference accuracy and operational stability. These components are integrated to form a service infrastructure that manages user contexts, handles complex task dependencies, and orchestrates end-to-end agentic workflow execution. Through this sophisticated architecture, Deep Agent establishes a novel paradigm in self-governing AI systems, demonstrating robust capability to independently handle intricate, multi-step tasks while maintaining consistent efficiency and reliability through continuous self-optimization.

Autonomous Deep Agent

TL;DR

Deep Agent tackles the challenge of autonomously handling complex, multi-phase tasks with dynamic dependencies in real-time contexts. It combines a Hierarchical Task DAG with a two-stage planner-executor, Autonomous API & Tool Creation, and a Prompt Tweaking Engine with Autonomous Prompt Feedback Learning to enable end-to-end task orchestration, dynamic adaptation, and self-improvement. Key contributions include recursive task decomposition via HTDAG, automatic generation of reusable APIs/tools, prompt optimization to maintain inference stability, test-time computation with validation, and continuous prompt feedback learning for self-improvement. The approach promises reduced operational costs, greater resilience to disruptions, and scalable autonomous workflow execution suitable for industry automation.

Abstract

This technical brief introduces Deep Agent, an advanced autonomous AI system designed to manage complex multi-phase tasks through a novel hierarchical task management architecture. The system's foundation is built on our Hierarchical Task DAG (HTDAG) framework, which dynamically decomposes high-level objectives into manageable sub-tasks while rigorously maintaining dependencies and execution coherence. Deep Agent advances beyond traditional agent systems through three key innovations: First, it implements a recursive two-stage planner-executor architecture that enables continuous task refinement and adaptation as circumstances change. Second, it features an Autonomous API & Tool Creation (AATC) system that automatically generates reusable components from UI interactions, substantially reducing operational costs for similar tasks. Third, it incorporates Prompt Tweaking Engine and Autonomous Prompt Feedback Learning components that optimize Large Language Model prompts for specific scenarios, enhancing both inference accuracy and operational stability. These components are integrated to form a service infrastructure that manages user contexts, handles complex task dependencies, and orchestrates end-to-end agentic workflow execution. Through this sophisticated architecture, Deep Agent establishes a novel paradigm in self-governing AI systems, demonstrating robust capability to independently handle intricate, multi-step tasks while maintaining consistent efficiency and reliability through continuous self-optimization.

Paper Structure

This paper contains 7 sections, 8 figures.

Figures (8)

  • Figure 1: Deep Agent system overview illustrating hierarchical task decomposition and workflow management. The system demonstrates two representative tasks: finding the best preschool (Task 1) and monitoring shopping deals (Task 2). Each task is managed by a dedicated Task Manager that constructs Hierarchical Task DAGs (HTDAGs) to model sub-tasks and their dependencies. The system integrates with users through a "User Profile & Context Service" that enables personalization and runtime interactions (e.g., control and dynamic feedback, progress updates, and copilot functionality). Task execution is facilitated through three core services: an "LLM Backend" for reasoning, an "Info & Search Service" for information gathering, and an "Action, API & Tool Service" for end execution. The system implements two key feedback mechanisms: 1) runtime learning of user preferences for profile enhancement, and 2) autonomous creation of new APIs and tools to expand system capabilities.
  • Figure 2: Two-stage planner-executor architecture illustrated through a Nike deals monitoring example. This architecture decomposes a high-level task into sub-tasks, each through a dedicated planner-executor pipeline, enabling recursive task refinement, as shown by the sub-task "Investigate Nike website ..." being further decomposed through another planner-executor cycle. At each cycle, the planner dynamically constructs sub-task DAGs for workflow to the best given all available information at the moment (see also Figure \ref{['fig:HTDAG_dynamics']}), while the executor manages tactical execution. Upon failures or disruptions (such as additional constrains from user, or direct user intervention), re-planning is triggered to consider both execution state and the disruptions. Unresolved disruptions cascade to parent nodes for higher-level re-planning. This design allows sophisticated task management and resilient handling of complex, nested workflows.
  • Figure 3: Examples of dynamic DAG construction and adaptation during task execution. (a) Planner initially generated a plan to "search Nike, ...", "search Adidas ...", and then generate an integrated report on the promotions, deals and product comparison. During the report generation, user input ("I also want Jordan shoes") triggers graph expansion to create a new task node responsible for "the Jordan brand" specifically and will re-run the "integrated report" node, while preserving all existing "search Nike, ...", "search Adidas ..." nodes. (b-1) For the thread of investigating Nike brand promotions & deals, the task DAG construction is dynamic as new UI elements become available through the process. (b-2) If user copilot interaction happens during the process (assumed at the confirmation step in this diagram), and signaled "I actually wanted Jordan brand", re-planning is triggered, and it not only impacts the current DAG (confirm with user again, and either 1) terminate the current DAG if they only wanted Jordan brand, or 2) re-plans the current DAG given the latest status after copilot), but also falls back to the parent graph to create a new node for the "Jordan brand" task.
  • Figure 4: Autonomous API & Tool Creation (AATC) framework and examples. The system analyzes target UIs through Deep Agent and Task Simulator to automatically generate new APIs and composite tools. Left: The framework's closed-loop process where Deep Agent analyzes UI functionalities and creates new APIs/tools, which are then verified through task simulation. Right: Example outputs including automatically generated APIs (top) covering core UI functionalities with proper parameter specifications, and synthesized composite tools (bottom) that chain APIs into higher-level workflows. Shown examples include a Purchase Assistant tool that coordinates product search, comparison, and checkout operations, and a Feedback Assistant tool that manages user feedback and customer service interactions. This autonomous creation process enables continuous and scalable expansion of the system's capabilities through UI analysis.
  • Figure 5: The Prompt Tweaking Engine to reduce irrelevant instructions and rules. The engine processes input prompts containing generic instructions and rules (left), applying LLM-based analysis and retrieval techniques to produce optimized task-specific prompts (right). This process removes irrelevant instructions while preserving essential context and rules for the specific task scenario.
  • ...and 3 more figures