Table of Contents
Fetching ...

ATLaS: Agent Tuning via Learning Critical Steps

Zhixun Chen, Ming Li, Yuxuan Huang, Yali Du, Meng Fang, Tianyi Zhou

TL;DR

ATLaS tackles overfitting and data inefficiency in fine-tuning LLM agents by learning only critical steps from expert trajectories. It uses an oracle selector to identify four categories of critical steps and trains on a reducedDc dataset comprising up to $30\%$ of steps. Experiments across held-in and held-out environments show that ATLaS outperforms full-trajectory finetuning and several baselines, with consistent gains across backbone models. The approach reduces training cost while preserving and enhancing generalization in diverse tasks.

Abstract

Large Language Model (LLM) agents have demonstrated remarkable generalization capabilities across multi-domain tasks. Existing agent tuning approaches typically employ supervised finetuning on entire expert trajectories. However, behavior-cloning of full trajectories can introduce expert bias and weaken generalization to states not covered by the expert data. Additionally, critical steps, such as planning, complex reasoning for intermediate subtasks, and strategic decision-making, are essential to success in agent tasks, so learning these steps is the key to improving LLM agents. For more effective and efficient agent tuning, we propose ATLaS that identifies the critical steps in expert trajectories and finetunes LLMs solely on these steps with reduced costs. By steering the training's focus to a few critical steps, our method mitigates the risk of overfitting entire trajectories and promotes generalization across different environments and tasks. In extensive experiments, an LLM finetuned on only 30% critical steps selected by ATLaS outperforms the LLM finetuned on all steps and recent open-source LLM agents. ATLaS maintains and improves base LLM skills as generalist agents interacting with diverse environments.

ATLaS: Agent Tuning via Learning Critical Steps

TL;DR

ATLaS tackles overfitting and data inefficiency in fine-tuning LLM agents by learning only critical steps from expert trajectories. It uses an oracle selector to identify four categories of critical steps and trains on a reducedDc dataset comprising up to of steps. Experiments across held-in and held-out environments show that ATLaS outperforms full-trajectory finetuning and several baselines, with consistent gains across backbone models. The approach reduces training cost while preserving and enhancing generalization in diverse tasks.

Abstract

Large Language Model (LLM) agents have demonstrated remarkable generalization capabilities across multi-domain tasks. Existing agent tuning approaches typically employ supervised finetuning on entire expert trajectories. However, behavior-cloning of full trajectories can introduce expert bias and weaken generalization to states not covered by the expert data. Additionally, critical steps, such as planning, complex reasoning for intermediate subtasks, and strategic decision-making, are essential to success in agent tasks, so learning these steps is the key to improving LLM agents. For more effective and efficient agent tuning, we propose ATLaS that identifies the critical steps in expert trajectories and finetunes LLMs solely on these steps with reduced costs. By steering the training's focus to a few critical steps, our method mitigates the risk of overfitting entire trajectories and promotes generalization across different environments and tasks. In extensive experiments, an LLM finetuned on only 30% critical steps selected by ATLaS outperforms the LLM finetuned on all steps and recent open-source LLM agents. ATLaS maintains and improves base LLM skills as generalist agents interacting with diverse environments.

Paper Structure

This paper contains 32 sections, 10 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: The proposed ATLaS identifies the critical steps in expert trajectories collected from diverse interactive environments and finetunes the agent on these steps only, where $a_i$ represents the expert action in step-$i$. ATLaS alleviates the potential overfitting to experts' every-step behaviors and achieves better generalizability by training on much fewer steps ("less is better").
  • Figure 2: Three base LLMs finetuned by ATLaS vs. full trajectories (100% of the steps), evaluated on held-in and held-out agentic tasks. ATLaS consistently outperforms full-trajectory finetuning, indicating better generalizability of ATLaS by training on fewer but critical steps.
  • Figure 3: Overall of ATLaS. The selector identifies critical steps in expert trajectories collected in multiple environments, where "O" and "A" denote observation and action, respectively. Training loss is only computed on the critical steps. This encourages more exploration of non-critical steps, reduces the training cost, and improves the agent's generalization performance.