Table of Contents
Fetching ...

ADC: Enhancing Function Calling Via Adversarial Datasets and Code Line-Level Feedback

Wei Zhang, Yi Zhang, Li Zhu, Qianghuai Jia, Feijun Jiang, Hongcheng Guo, Zhoujun Li, Mengping Zhou

TL;DR

The paper tackles robustness gaps in large language models when performing complex function calls, especially in following precise function formats and matching multi-parameter inputs. It introduces ADC, a training paradigm that combines granular line-level code execution feedback with an adversarial data-generation loop and a staged learning process to improve function-calling accuracy. On the BFCL benchmark, ADC achieves strong execution performance (87.50) and a high overall score (79.01), surpassing many baselines and demonstrating enhanced logical reasoning and parameter matching. The work contributes a comprehensive data pipeline (code + feedback), an adversarial refinement loop, and a validated staged training strategy, offering practical advances for deploying reliable function-calling in real-world LLM applications.

Abstract

Large Language Models (LLMs) have made significant strides in Natural Language Processing and coding, yet they struggle with robustness and accuracy in complex function calls. To tackle these challenges, this paper introduces ADC, an innovative approach that enhances LLMs' ability to follow function formats and match complex parameters. ADC utilizes a high-quality code fine-tuning dataset with line-level execution feedback, providing granular process supervision that fosters strong logical reasoning and adherence to function formats. It also employs an adversarial dataset generation process to improve parameter matching. The staged training methodology capitalizes on both enriched code datasets and refined adversarial datasets, leading to marked improvements in function calling capabilities on the Berkeley Function-Calling Leaderboard (BFCL) Benchmark. The innovation of ADC lies in its strategic combination of process supervision, adversarial refinement, and incremental learning, setting a new standard for LLM proficiency in complex function calling.

ADC: Enhancing Function Calling Via Adversarial Datasets and Code Line-Level Feedback

TL;DR

The paper tackles robustness gaps in large language models when performing complex function calls, especially in following precise function formats and matching multi-parameter inputs. It introduces ADC, a training paradigm that combines granular line-level code execution feedback with an adversarial data-generation loop and a staged learning process to improve function-calling accuracy. On the BFCL benchmark, ADC achieves strong execution performance (87.50) and a high overall score (79.01), surpassing many baselines and demonstrating enhanced logical reasoning and parameter matching. The work contributes a comprehensive data pipeline (code + feedback), an adversarial refinement loop, and a validated staged training strategy, offering practical advances for deploying reliable function-calling in real-world LLM applications.

Abstract

Large Language Models (LLMs) have made significant strides in Natural Language Processing and coding, yet they struggle with robustness and accuracy in complex function calls. To tackle these challenges, this paper introduces ADC, an innovative approach that enhances LLMs' ability to follow function formats and match complex parameters. ADC utilizes a high-quality code fine-tuning dataset with line-level execution feedback, providing granular process supervision that fosters strong logical reasoning and adherence to function formats. It also employs an adversarial dataset generation process to improve parameter matching. The staged training methodology capitalizes on both enriched code datasets and refined adversarial datasets, leading to marked improvements in function calling capabilities on the Berkeley Function-Calling Leaderboard (BFCL) Benchmark. The innovation of ADC lies in its strategic combination of process supervision, adversarial refinement, and incremental learning, setting a new standard for LLM proficiency in complex function calling.

Paper Structure

This paper contains 24 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Overview of ADC. We first create a detailed code dataset with line-level execution feedback by executing the code and embedding the feedback into the code. Then, we employ an LLM generator and an LLM discriminator to refine the function calling dataset. The staged training process leverages both datasets to improve the function calling ability of ADC.