Table of Contents
Fetching ...

Sub-goal Distillation: A Method to Improve Small Language Agents

Maryam Hashemzadeh, Elias Stengel-Eskin, Sarath Chandar, Marc-Alexandre Cote

TL;DR

The paper tackles the cost and accessibility constraints of large LLMs for interactive, long-horizon tasks by distilling LLM planning knowledge into a hierarchical agent built from smaller language models. A high-level sub-goal generator, trained via Knowledge Distillation from an LLM, guides a low-level action generator to execute sub-goals without real-time LLM queries, reducing inference costs. In ScienceWorld, this hierarchical KD approach outperforms standard imitation-learning baselines and SwiftSage, solving more task types and showing better generalization, while also demonstrating robustness to sub-goal noise and scale effects. The work highlights the practicality and scalability of deploying compact language models for complex decision-making tasks and outlines pathways for goal modification and multi-module extensions.

Abstract

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.

Sub-goal Distillation: A Method to Improve Small Language Agents

TL;DR

The paper tackles the cost and accessibility constraints of large LLMs for interactive, long-horizon tasks by distilling LLM planning knowledge into a hierarchical agent built from smaller language models. A high-level sub-goal generator, trained via Knowledge Distillation from an LLM, guides a low-level action generator to execute sub-goals without real-time LLM queries, reducing inference costs. In ScienceWorld, this hierarchical KD approach outperforms standard imitation-learning baselines and SwiftSage, solving more task types and showing better generalization, while also demonstrating robustness to sub-goal noise and scale effects. The work highlights the practicality and scalability of deploying compact language models for complex decision-making tasks and outlines pathways for goal modification and multi-module extensions.

Abstract

While Large Language Models (LLMs) have demonstrated significant promise as agents in interactive tasks, their substantial computational requirements and restricted number of calls constrain their practical utility, especially in long-horizon interactive tasks such as decision-making or in scenarios involving continuous ongoing tasks. To address these constraints, we propose a method for transferring the performance of an LLM with billions of parameters to a much smaller language model (770M parameters). Our approach involves constructing a hierarchical agent comprising a planning module, which learns through Knowledge Distillation from an LLM to generate sub-goals, and an execution module, which learns to accomplish these sub-goals using elementary actions. In detail, we leverage an LLM to annotate an oracle path with a sequence of sub-goals towards completing a goal. Subsequently, we utilize this annotated data to fine-tune both the planning and execution modules. Importantly, neither module relies on real-time access to an LLM during inference, significantly reducing the overall cost associated with LLM interactions to a fixed cost. In ScienceWorld, a challenging and multi-task interactive text environment, our method surpasses standard imitation learning based solely on elementary actions by 16.7% (absolute). Our analysis highlights the efficiency of our approach compared to other LLM-based methods. Our code and annotated data for distillation can be found on GitHub.
Paper Structure (45 sections, 5 figures, 11 tables)

This paper contains 45 sections, 5 figures, 11 tables.

Figures (5)

  • Figure 1: Example of annotating an expert trajectory with sub-goals for a particular variation of task 1-4 (change-the-state-of-matter-of). Looking only at the original trajectory (i.e., ignoring the red rows), we gather the expert ended up changing the state of water to be frozen. The expert had to navigate to the kitchen, find a thermometer and a metal pot, pour water into the pot, place it in the freezer, and continually monitor its temperature until frozen. Each of those milestones (highlighted in red) can be considered a sub-goal, encompassing a sequence of actions. Sub-goals can be shared across different tasks, facilitating generalization. We opted for a format that looks like function calls to encourage reusability (e.g., fill(metal pot, water)).
  • Figure 2: On the left, a schematic view of our approach is shown. There are two modules: the sub-goal generator and action generator. The sub-goal generator provides a sub-goal for the action generator, which predicts the next action given the current sub-goal and history. On the right, the inputs and outputs of both modules are illustrated. The input comprises different parts including task description, completed sub-goal, current sub-goal, a history of recent actions-observations, and more, each highlighted in a distinct color.
  • Figure 3: Example of a trajectory generated by the LLM deviating from the provided expert trajectory. In this example, which is for a boiling task, certain actions are omitted in the generated trajectory, indicated in blue in the left box. To address these missing actions, we group them into sequences and prompt the LLM to generate sub-goals for them. If the generated trajectory includes additional actions, such as the green actions in the right box, we simply remove them to align with the expert trajectory.
  • Figure 4: The figure demonstrates KD to generate sub-goals using an LLM. The LLM is presented with a prompt containing two in-context examples. Each example is composed of a task description in green and an expert trajectory detailing the steps to accomplish that task in blue. It also includes the expected set of sub-goals with their corresponding sequences of actions in red. Following this, we provide a new task description and trajectory, and we let the LLM generate the associated sub-goals and segmented actions.
  • Figure 5: a) Average scores across different model sizes for flan-t5 and t5. For t5 model, X-Large refers to t5-3b. The larger models work better and flan-t5 performs also better than t5. Dashed lines represent models that are not conditioning on any sub-goals ("no sg") and equivalent to Swift-only. b) Average scores across different sizes of sub-goal generator while the action generator is kept to be base (blue) or small (green). Having larger sub-goal generators can significantly boost performance of small action generators.