Table of Contents
Fetching ...

Octopus: On-device language model for function calling of software APIs

Wei Chen, Zhiyuan Li, Mingyuan Ma

TL;DR

This work targets on-device language models for reliable function calling of software APIs by building a large, high-quality API-documentation dataset and fine-tuning small to mid-size LLMs via curriculum learning and LoRA. A novel inference-time conditional masking mechanism constrains outputs to the required function names and argument formats, reducing validation errors without compromising speed. The Octopus family demonstrates competitive performance with GPT-4 on API interactions, with several 7B models surpassing GPT-4 in certain scenarios when conditioned masking is applied. The approach enables practical, efficient API integration on mobile and edge devices and promises to accelerate automated software development, with the dataset slated for open-source release.

Abstract

In the rapidly evolving domain of artificial intelligence, Large Language Models (LLMs) play a crucial role due to their advanced text processing and generation abilities. This study introduces a new strategy aimed at harnessing on-device LLMs in invoking software APIs. We meticulously compile a dataset derived from software API documentation and apply fine-tuning to LLMs with capacities of 2B, 3B and 7B parameters, specifically to enhance their proficiency in software API interactions. Our approach concentrates on refining the models' grasp of API structures and syntax, significantly enhancing the accuracy of API function calls. Additionally, we propose \textit{conditional masking} techniques to ensure outputs in the desired formats and reduce error rates while maintaining inference speeds. We also propose a novel benchmark designed to evaluate the effectiveness of LLMs in API interactions, establishing a foundation for subsequent research. Octopus, the fine-tuned model, is proved to have better performance than GPT-4 for the software APIs calling. This research aims to advance automated software development and API integration, representing substantial progress in aligning LLM capabilities with the demands of practical software engineering applications.

Octopus: On-device language model for function calling of software APIs

TL;DR

This work targets on-device language models for reliable function calling of software APIs by building a large, high-quality API-documentation dataset and fine-tuning small to mid-size LLMs via curriculum learning and LoRA. A novel inference-time conditional masking mechanism constrains outputs to the required function names and argument formats, reducing validation errors without compromising speed. The Octopus family demonstrates competitive performance with GPT-4 on API interactions, with several 7B models surpassing GPT-4 in certain scenarios when conditioned masking is applied. The approach enables practical, efficient API integration on mobile and edge devices and promises to accelerate automated software development, with the dataset slated for open-source release.

Abstract

In the rapidly evolving domain of artificial intelligence, Large Language Models (LLMs) play a crucial role due to their advanced text processing and generation abilities. This study introduces a new strategy aimed at harnessing on-device LLMs in invoking software APIs. We meticulously compile a dataset derived from software API documentation and apply fine-tuning to LLMs with capacities of 2B, 3B and 7B parameters, specifically to enhance their proficiency in software API interactions. Our approach concentrates on refining the models' grasp of API structures and syntax, significantly enhancing the accuracy of API function calls. Additionally, we propose \textit{conditional masking} techniques to ensure outputs in the desired formats and reduce error rates while maintaining inference speeds. We also propose a novel benchmark designed to evaluate the effectiveness of LLMs in API interactions, establishing a foundation for subsequent research. Octopus, the fine-tuned model, is proved to have better performance than GPT-4 for the software APIs calling. This research aims to advance automated software development and API integration, representing substantial progress in aligning LLM capabilities with the demands of practical software engineering applications.
Paper Structure (16 sections, 7 equations, 4 figures)

This paper contains 16 sections, 7 equations, 4 figures.

Figures (4)

  • Figure 1: Refining dataset A into dataset B through a strict workflow. This process involves three critical steps: sampling positive queries solvable by specific APIs and generating corresponding responses and CoTs; identifying unsolvable queries and augmenting them with irrelevant function bodies; and employing semantic analysis to incorporate similar functions into data points. Following GPT-4's rigorous verification, Dataset B emerges as the optimized training dataset, poised to significantly elevate model efficacy.
  • Figure 2: The training and validation loss for selectd pretrained models
  • Figure 3: Comparison of accuracy between the GPT-3.5 and GPT-4 models, alongside our pretrained model named within the "Octopus" series. The prefix "Octopus" denotes the series, while the suffix indicates the specific pretrained model's name.
  • Figure 4: Comparison of accuracy between the GPT-3.5 and GPT-4 models, alongside our four pretrained models within the "Octopus" series, following the introduction of a conditional mask. "Octopus" serves as the series name, with the suffix indicating the specific name of each pretrained model. This comparison highlights the impact of conditional masking on model performance.