Table of Contents
Fetching ...

CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance

Yupu Hao, Pengfei Cao, Zhuoran Jin, Huanxuan Liao, Yubo Chen, Kang Liu, Jun Zhao

TL;DR

This paper tackles the trade-off between enabling LLMs to use external tools and preserving their general reasoning abilities. It reveals that fine-tuning on tool-learning data induces a co-directional shift in hidden representations and that certain model components are consistently more influential across tasks. The authors propose CITI, a component-aware framework that applies Mixture-of-LoRA adapters to important components and performs selective full fine-tuning on unimportant components, guided by gradient-based importance scores and a router that separates tool-related from tool-unrelated inputs. Through extensive experiments on API-Bank and ToolAlpaca, coupled with ablations, CITI demonstrates strong tool-utilization performance while maintaining superior general abilities compared with full fine-tuning and standard LoRA baselines. This work offers a practical pathway to integrate tool usage into LLMs with minimal forgetting, advancing robust, tool-enabled NLP systems.

Abstract

Tool learning enables the Large Language Models (LLMs) to interact with the external environment by invoking tools, enriching the accuracy and capability scope of LLMs. However, previous works predominantly focus on improving model's tool-utilizing accuracy and the ability to generalize to new, unseen tools, excessively forcing LLMs to adjust specific tool-invoking pattern without considering the harm to model's general performance. This deviates from the actual applications and original intention of integrating tools to enhance model. To tackle this problem, we dissect the capability trade-offs by examining the hidden representation changes and the gradient-based importance score of model's components. Based on the analysis result, we propose a Component Importance-based Tool-utilizing ability Injection method (CITI). According to the gradient-based importance score of different components, it alleviates the capability conflicts caused by fine-tuning process by applying distinct training strategies to different components. CITI applies Mixture-Of-LoRA (MOLoRA) for important components. Meanwhile, it fine-tunes the parameters of few components deemed less important in the backbone of the LLM, while keeping other parameters frozen. CITI can effectively enhance the model's tool-utilizing capability without excessively compromising its general performance. Experimental results demonstrate that our approach achieves outstanding performance across a range of evaluation metrics.

CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance

TL;DR

This paper tackles the trade-off between enabling LLMs to use external tools and preserving their general reasoning abilities. It reveals that fine-tuning on tool-learning data induces a co-directional shift in hidden representations and that certain model components are consistently more influential across tasks. The authors propose CITI, a component-aware framework that applies Mixture-of-LoRA adapters to important components and performs selective full fine-tuning on unimportant components, guided by gradient-based importance scores and a router that separates tool-related from tool-unrelated inputs. Through extensive experiments on API-Bank and ToolAlpaca, coupled with ablations, CITI demonstrates strong tool-utilization performance while maintaining superior general abilities compared with full fine-tuning and standard LoRA baselines. This work offers a practical pathway to integrate tool usage into LLMs with minimal forgetting, advancing robust, tool-enabled NLP systems.

Abstract

Tool learning enables the Large Language Models (LLMs) to interact with the external environment by invoking tools, enriching the accuracy and capability scope of LLMs. However, previous works predominantly focus on improving model's tool-utilizing accuracy and the ability to generalize to new, unseen tools, excessively forcing LLMs to adjust specific tool-invoking pattern without considering the harm to model's general performance. This deviates from the actual applications and original intention of integrating tools to enhance model. To tackle this problem, we dissect the capability trade-offs by examining the hidden representation changes and the gradient-based importance score of model's components. Based on the analysis result, we propose a Component Importance-based Tool-utilizing ability Injection method (CITI). According to the gradient-based importance score of different components, it alleviates the capability conflicts caused by fine-tuning process by applying distinct training strategies to different components. CITI applies Mixture-Of-LoRA (MOLoRA) for important components. Meanwhile, it fine-tunes the parameters of few components deemed less important in the backbone of the LLM, while keeping other parameters frozen. CITI can effectively enhance the model's tool-utilizing capability without excessively compromising its general performance. Experimental results demonstrate that our approach achieves outstanding performance across a range of evaluation metrics.
Paper Structure (32 sections, 11 equations, 16 figures, 7 tables)

This paper contains 32 sections, 11 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: The model's performance in tool-utilizing and general tasks, FT represents full parameter fine-tuning on dataset API-Bank li-etal-2023-api.
  • Figure 2: Cosine similarity of $ICC$ between the input of different layers of Feed-Forward Network (FFN) in model Meta-Llama-3-8B-Instruct, where the notation with an asterisk (*) represents $ICC$ fine-tuned on the code-related dataset (e.g. TQA* represents $ICC$ of TriviaQA trained by code dataset), and no asterisk (*) represents $ICC$ fine-tuned on tool learning dataset.
  • Figure 4: The overall architecture of CITI. The MOLoRA adapters are applied to important components identified by $\mathcal{M}_h$, and unimportant components identified by $\mathcal{C}_h$ are fine-tuned with all parameters.
  • Figure 5: Case study of model's output
  • Figure 6: Router weight allocation of tool-related data
  • ...and 11 more figures