Table of Contents
Fetching ...

LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents

Taro Yano, Yoichi Ishibashi, Masafumi Oyamada

TL;DR

LaMDAgent introduces an autonomous, LLM-based agent framework to construct and optimize end-to-end post-training pipelines for large language models by unifying supervised fine-tuning, preference learning, and model merging. It operates through a four-step loop—action enumeration, action selection, model evaluation, and memory update—guided by memory of past trials to discover high-performing pipelines with minimal human input. Empirical results show LaMDAgent yields notable gains, including a 9.0-point improvement in tool usage and a 3.7-point boost on math-related tasks in separate experiments, while maintaining general capabilities. The work demonstrates the practical potential of data-size scaling for cost-effective exploration and highlights the framework’s ability to uncover non-obvious, high-performing pipelines, contributing a new automated approach to tailoring LLMs for specific domains and tasks.

Abstract

Large Language Models (LLMs) have demonstrated exceptional performance across a wide range of tasks. To further tailor LLMs to specific domains or applications, post-training techniques such as Supervised Fine-Tuning (SFT), Preference Learning, and model merging are commonly employed. While each of these methods has been extensively studied in isolation, the automated construction of complete post-training pipelines remains an underexplored area. Existing approaches typically rely on manual design or focus narrowly on optimizing individual components, such as data ordering or merging strategies. In this work, we introduce LaMDAgent (short for Language Model Developing Agent), a novel framework that autonomously constructs and optimizes full post-training pipelines through the use of LLM-based agents. LaMDAgent systematically explores diverse model generation techniques, datasets, and hyperparameter configurations, leveraging task-based feedback to discover high-performing pipelines with minimal human intervention. Our experiments show that LaMDAgent improves tool-use accuracy by 9.0 points while preserving instruction-following capabilities. Moreover, it uncovers effective post-training strategies that are often overlooked by conventional human-driven exploration. We further analyze the impact of data and model size scaling to reduce computational costs on the exploration, finding that model size scalings introduces new challenges, whereas scaling data size enables cost-effective pipeline discovery.

LaMDAgent: An Autonomous Framework for Post-Training Pipeline Optimization via LLM Agents

TL;DR

LaMDAgent introduces an autonomous, LLM-based agent framework to construct and optimize end-to-end post-training pipelines for large language models by unifying supervised fine-tuning, preference learning, and model merging. It operates through a four-step loop—action enumeration, action selection, model evaluation, and memory update—guided by memory of past trials to discover high-performing pipelines with minimal human input. Empirical results show LaMDAgent yields notable gains, including a 9.0-point improvement in tool usage and a 3.7-point boost on math-related tasks in separate experiments, while maintaining general capabilities. The work demonstrates the practical potential of data-size scaling for cost-effective exploration and highlights the framework’s ability to uncover non-obvious, high-performing pipelines, contributing a new automated approach to tailoring LLMs for specific domains and tasks.

Abstract

Large Language Models (LLMs) have demonstrated exceptional performance across a wide range of tasks. To further tailor LLMs to specific domains or applications, post-training techniques such as Supervised Fine-Tuning (SFT), Preference Learning, and model merging are commonly employed. While each of these methods has been extensively studied in isolation, the automated construction of complete post-training pipelines remains an underexplored area. Existing approaches typically rely on manual design or focus narrowly on optimizing individual components, such as data ordering or merging strategies. In this work, we introduce LaMDAgent (short for Language Model Developing Agent), a novel framework that autonomously constructs and optimizes full post-training pipelines through the use of LLM-based agents. LaMDAgent systematically explores diverse model generation techniques, datasets, and hyperparameter configurations, leveraging task-based feedback to discover high-performing pipelines with minimal human intervention. Our experiments show that LaMDAgent improves tool-use accuracy by 9.0 points while preserving instruction-following capabilities. Moreover, it uncovers effective post-training strategies that are often overlooked by conventional human-driven exploration. We further analyze the impact of data and model size scaling to reduce computational costs on the exploration, finding that model size scalings introduces new challenges, whereas scaling data size enables cost-effective pipeline discovery.

Paper Structure

This paper contains 18 sections, 3 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: Overview of our LaMDAgent framework. LaMDAgent first enumerates actions from predefined model improving action types and an object pool containing available data, models, parameters, and other objects (Step1. Action Eunumeration). Next, the agent selects an action based on memory acquired from previous trials and executes the selected action to generate a new model (Step2. Action Selection). Then evaluations on downstream tasks are conducted (Step3. Model Evaluation). Based on the evaluation results of the newly generated model, the agent considers promising future directions and insights, updating the accumulated memory (Step4. Memory Update).
  • Figure 2: Prompt template to update memory
  • Figure 3: Top-1, Top-2, and Top-3 pipelines discovered in experiment 1.
  • Figure 4: LaMDAgent significantly improves tool usage capability while maintaining instruction-following performance: The overall performance evaluation results of Experiment 2 indicate that LaMDAgent improves AceBench accuracy by 9.0 points while preserving the MT-Bench score. In contrast, naive fine-tuning approaches on either individual or full SFT datasets fail to enhance tool usage capabilities, suggesting that the task cannot be effectively addressed with such straightforward methods.
  • Figure 5: LaMDAgent learns from feedback to exploit promising actions while exploring unseen pipelines: The graph shows the Average Score, Max Score, and Standard Deviation recorded every 15 iterations. The consistent increase in average score indicates that the agent continues to learn from past feedback to exploit promising actions. The non-zero standard deviation through all iterations and improving max score implies that the agent maintains exploration to discover further improvement opportunities alongside exploitation.
  • ...and 6 more figures