Table of Contents
Fetching ...

LLM-Augmented Symbolic Reinforcement Learning with Landmark-Based Task Decomposition

Alireza Kheirandish, Duo Xu, Faramarz Fekri

TL;DR

The experiments show that the LLM-guided rule template generation can produce rules that are necessary for solving a subtask, which leads to solving complex tasks with fewer assumptions about predefined first-order logic predicates of the environment.

Abstract

One of the fundamental challenges in reinforcement learning (RL) is to take a complex task and be able to decompose it to subtasks that are simpler for the RL agent to learn. In this paper, we report on our work that would identify subtasks by using some given positive and negative trajectories for solving the complex task. We assume that the states are represented by first-order predicate logic using which we devise a novel algorithm to identify the subtasks. Then we employ a Large Language Model (LLM) to generate first-order logic rule templates for achieving each subtask. Such rules were then further fined tuned to a rule-based policy via an Inductive Logic Programming (ILP)-based RL agent. Through experiments, we verify the accuracy of our algorithm in detecting subtasks which successfully detect all of the subtasks correctly. We also investigated the quality of the common-sense rules produced by the language model to achieve the subtasks. Our experiments show that our LLM-guided rule template generation can produce rules that are necessary for solving a subtask, which leads to solving complex tasks with fewer assumptions about predefined first-order logic predicates of the environment.

LLM-Augmented Symbolic Reinforcement Learning with Landmark-Based Task Decomposition

TL;DR

The experiments show that the LLM-guided rule template generation can produce rules that are necessary for solving a subtask, which leads to solving complex tasks with fewer assumptions about predefined first-order logic predicates of the environment.

Abstract

One of the fundamental challenges in reinforcement learning (RL) is to take a complex task and be able to decompose it to subtasks that are simpler for the RL agent to learn. In this paper, we report on our work that would identify subtasks by using some given positive and negative trajectories for solving the complex task. We assume that the states are represented by first-order predicate logic using which we devise a novel algorithm to identify the subtasks. Then we employ a Large Language Model (LLM) to generate first-order logic rule templates for achieving each subtask. Such rules were then further fined tuned to a rule-based policy via an Inductive Logic Programming (ILP)-based RL agent. Through experiments, we verify the accuracy of our algorithm in detecting subtasks which successfully detect all of the subtasks correctly. We also investigated the quality of the common-sense rules produced by the language model to achieve the subtasks. Our experiments show that our LLM-guided rule template generation can produce rules that are necessary for solving a subtask, which leads to solving complex tasks with fewer assumptions about predefined first-order logic predicates of the environment.
Paper Structure (5 sections, 1 equation, 6 figures, 2 tables, 1 algorithm)

This paper contains 5 sections, 1 equation, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: The schematic shows the graph generated by the algorithm for environments with three predicates. The root represents states containing all three predicates, and each subsequent level illustrates states formed by removing one predicate. Each edge indicates which predicate was removed at that node. The final leaves contain only one predicate.
  • Figure 2: Performance of landmark identification: The top-left plot shows contrastive learning results for all landmarks, and the bottom-right plot displays improvements after applying a tree graph search. Other plots focus on specific landmarks before the graph search. Recall improved from 83% to 100%, and precision increased from 10% to 100% with the search algorithm.
  • Figure 3: Comparison of algorithm convergence: The red plot shows performance on 4 subtasks, the blue plot on 3 subtasks, and the green plot represents the ILP-RL agent using a human expert's rule template.
  • Figure 4: Top image: Prompt for generating the base template rule, including a constant section with few shot examples from various environments and the specific coin subtask. Bottom image: Few shot learning applied to refine the template rule by generating more general rules.
  • Figure 5: Comparison of the human expert's rule policy with LLM-generated rules for coin subtask. The final policy chosen by the ILP-RL agent is marked in red, demonstrating the effectiveness of subtasks in guiding smaller policy rules with less predicate or environmental information.
  • ...and 1 more figures