Table of Contents
Fetching ...

Agent Fine-tuning through Distillation for Domain-specific LLMs in Microdomains

Yawen Xue, Masaya Tsunokake, Yuta Koreeda, Ekant Muljibhai Amin, Takashi Sumiyoshi, Yasuhiro Sogawa

TL;DR

This work investigates domain-specific fine-tuning of agentic LLMs within Hitachi's JP1 middleware to improve autonomous reasoning in a specialized IT microdomain. By CPT pretraining on JP1 manuals and SFT on JP1-domain and public trajectories (ReAct/CoT), and incorporating retrieval-augmented generation with a contextual answer extractor, the authors demonstrate significant performance gains on JP1 certification MCQs. The results reveal that agentic prompting and CPT substantially boost accuracy, with CoT trajectories during fine-tuning sometimes reducing performance at higher expertise levels, and that the proposed method can outperform GPT-4 on advanced JP1 tasks. The study highlights the practical potential of domain-adaptive agent fine-tuning for reliable, efficient domain-specific decision-making in complex microdomains.

Abstract

Agentic large language models (LLMs) have become prominent for autonomously interacting with external environments and performing multi-step reasoning tasks. Most approaches leverage these capabilities via in-context learning with few-shot prompts, but this often results in lengthy inputs and higher computational costs. Agent fine-tuning offers an alternative by enabling LLMs to internalize procedural reasoning and domain-specific knowledge through training on relevant data and demonstration trajectories. While prior studies have focused on general domains, their effectiveness in specialized technical microdomains remains unclear. This paper explores agent fine-tuning for domain adaptation within Hitachi's JP1 middleware, a microdomain for specialized IT operations. We fine-tuned LLMs using JP1-specific datasets derived from domain manuals and distilled reasoning trajectories generated by LLMs themselves, enhancing decision making accuracy and search efficiency. During inference, we used an agentic prompt with retrieval-augmented generation and introduced a context-answer extractor to improve information relevance. On JP1 certification exam questions, our method achieved a 14% performance improvement over the base model, demonstrating the potential of agent fine-tuning for domain-specific reasoning in complex microdomains.

Agent Fine-tuning through Distillation for Domain-specific LLMs in Microdomains

TL;DR

This work investigates domain-specific fine-tuning of agentic LLMs within Hitachi's JP1 middleware to improve autonomous reasoning in a specialized IT microdomain. By CPT pretraining on JP1 manuals and SFT on JP1-domain and public trajectories (ReAct/CoT), and incorporating retrieval-augmented generation with a contextual answer extractor, the authors demonstrate significant performance gains on JP1 certification MCQs. The results reveal that agentic prompting and CPT substantially boost accuracy, with CoT trajectories during fine-tuning sometimes reducing performance at higher expertise levels, and that the proposed method can outperform GPT-4 on advanced JP1 tasks. The study highlights the practical potential of domain-adaptive agent fine-tuning for reliable, efficient domain-specific decision-making in complex microdomains.

Abstract

Agentic large language models (LLMs) have become prominent for autonomously interacting with external environments and performing multi-step reasoning tasks. Most approaches leverage these capabilities via in-context learning with few-shot prompts, but this often results in lengthy inputs and higher computational costs. Agent fine-tuning offers an alternative by enabling LLMs to internalize procedural reasoning and domain-specific knowledge through training on relevant data and demonstration trajectories. While prior studies have focused on general domains, their effectiveness in specialized technical microdomains remains unclear. This paper explores agent fine-tuning for domain adaptation within Hitachi's JP1 middleware, a microdomain for specialized IT operations. We fine-tuned LLMs using JP1-specific datasets derived from domain manuals and distilled reasoning trajectories generated by LLMs themselves, enhancing decision making accuracy and search efficiency. During inference, we used an agentic prompt with retrieval-augmented generation and introduced a context-answer extractor to improve information relevance. On JP1 certification exam questions, our method achieved a 14% performance improvement over the base model, demonstrating the potential of agent fine-tuning for domain-specific reasoning in complex microdomains.

Paper Structure

This paper contains 21 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Pipeline of Language Agent Fine-Tuning
  • Figure 2: Illustration of the process for generating agent trajectories during inference, referred to as the Trajectory Generator.