Table of Contents
Fetching ...

Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration

Yanchu Guan, Dong Wang, Yan Wang, Haiqing Wang, Renen Sun, Chenyi Zhuang, Jinjie Gu, Zhixuan Chu

TL;DR

An Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction is proposed.

Abstract

Autonomous mobile app interaction has become increasingly important with growing complexity of mobile applications. Developing intelligent agents that can effectively navigate and interact with mobile apps remains a significant challenge. In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction. EBC-LLMAgent consists of three core modules: Demonstration Encoding, Code Generation, and UI Mapping, which work synergistically to capture user demonstrations, generate executable codes, and establish accurate correspondence between code and UI elements. We introduce the Behavior Cloning Chain Fusion technique to enhance the generalization capabilities of the agent. Extensive experiments on five popular mobile applications from diverse domains demonstrate the superior performance of EBC-LLMAgent, achieving high success rates in task completion, efficient generalization to unseen scenarios, and the generation of meaningful explanations.

Explainable Behavior Cloning: Teaching Large Language Model Agents through Learning by Demonstration

TL;DR

An Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction is proposed.

Abstract

Autonomous mobile app interaction has become increasingly important with growing complexity of mobile applications. Developing intelligent agents that can effectively navigate and interact with mobile apps remains a significant challenge. In this paper, we propose an Explainable Behavior Cloning LLM Agent (EBC-LLMAgent), a novel approach that combines large language models (LLMs) with behavior cloning by learning demonstrations to create intelligent and explainable agents for autonomous mobile app interaction. EBC-LLMAgent consists of three core modules: Demonstration Encoding, Code Generation, and UI Mapping, which work synergistically to capture user demonstrations, generate executable codes, and establish accurate correspondence between code and UI elements. We introduce the Behavior Cloning Chain Fusion technique to enhance the generalization capabilities of the agent. Extensive experiments on five popular mobile applications from diverse domains demonstrate the superior performance of EBC-LLMAgent, achieving high success rates in task completion, efficient generalization to unseen scenarios, and the generation of meaningful explanations.

Paper Structure

This paper contains 22 sections, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Real-world examples of Code Generation in EBC-LLMAgent across various mobile applications. Each represents a different task: (a) Ordering from McDonald's, (b) Booking a flight ticket, (c) Downloading a movie on YouTube, (d) Ordering from Starbucks, and (e) Sending a message on WhatsApp. These examples showcase the agent's ability to handle diverse tasks across different app interfaces, generating modular and interpretable code that bridges the gap between user intent and app-specific actions.
  • Figure 2: The framework of our proposed EBC-LLMAgent.
  • Figure 3: Task SR tends to decrease as the number of steps increases.
  • Figure 4: The example of Demonstration Encoding.
  • Figure 5: The XML fragment.
  • ...and 4 more figures