Table of Contents
Fetching ...

Lightweight Model Editing for LLMs to Correct Deprecated API Recommendations

Guancheng Lin, Xiao Yu, Jacky Keung, Xing Hu, Xin Xia, Alex X. Liu

TL;DR

The paper tackles the problem of outdated API knowledge in LLMs used for code completion by introducing EDAPIBench, a fully automated benchmark for deprecated API editing, and a family of editing methods. Through a systematic study of 10 editing techniques across three LLMs, it finds AdaLoRA excels in effectiveness, generalization, and portability but suffers from poor specificity. To address this, the authors propose AdaLoRA-L, which confines edits to API-specific layers using a gradient-based layer-importance signal $S_i$, yielding substantial improvements in specificity while preserving other strengths. The work provides a standardized platform and a practical, scalable approach for updating LLMs with up-to-date API usage, reducing technical debt in software tooling that relies on code-generating models.

Abstract

Pre-trained or fine-tuned on large code corpora, Large Language Models (LLMs) have demonstrated strong performance in code completion tasks. However, their embedded knowledge is constrained by the timeliness of training data, which often includes code using deprecated APIs. Consequently, LLMs frequently generate deprecated APIs that will no longer be supported in future versions of third-party libraries. While retraining LLMs on updated codebases could refresh their API knowledge, this approach is computationally expensive. Recently, lightweight model editing methods have emerged to efficiently correct specific knowledge in LLMs. However, it remains unclear whether these methods can effectively update deprecated API knowledge and enable edited models to generate up-to-date APIs. To address this gap, we conduct the first systematic study applying 10 state-of-the-art model editing techniques to update deprecated API knowledge in three LLMs: Qwen2.5-Coder, StarCoder2, and DeepSeek-Coder. We introduce EDAPIBench, a dedicated benchmark featuring over 70 deprecated APIs from 8 popular Python libraries, with more than 3,000 editing instances. Our results show that the parameter-efficient fine-tuning method AdaLoRA achieves the best performance in enabling edited models to generate correct, up-to-date APIs, but falls short in Specificity (i.e., the editing influences untargeted knowledge). To resolve this, we propose AdaLoRA-L, which defines "Common API Layers" (layers within the LLMs with high importance across all APIs, storing general knowledge and excluded from editing) and restricts edits exclusively to "Specific API Layers" (layers with high importance only for the target API, storing the API-specific knowledge). Experimental results demonstrate that AdaLoRA-L significantly improves Specificity while maintaining comparable performance across other evaluation metrics.

Lightweight Model Editing for LLMs to Correct Deprecated API Recommendations

TL;DR

The paper tackles the problem of outdated API knowledge in LLMs used for code completion by introducing EDAPIBench, a fully automated benchmark for deprecated API editing, and a family of editing methods. Through a systematic study of 10 editing techniques across three LLMs, it finds AdaLoRA excels in effectiveness, generalization, and portability but suffers from poor specificity. To address this, the authors propose AdaLoRA-L, which confines edits to API-specific layers using a gradient-based layer-importance signal , yielding substantial improvements in specificity while preserving other strengths. The work provides a standardized platform and a practical, scalable approach for updating LLMs with up-to-date API usage, reducing technical debt in software tooling that relies on code-generating models.

Abstract

Pre-trained or fine-tuned on large code corpora, Large Language Models (LLMs) have demonstrated strong performance in code completion tasks. However, their embedded knowledge is constrained by the timeliness of training data, which often includes code using deprecated APIs. Consequently, LLMs frequently generate deprecated APIs that will no longer be supported in future versions of third-party libraries. While retraining LLMs on updated codebases could refresh their API knowledge, this approach is computationally expensive. Recently, lightweight model editing methods have emerged to efficiently correct specific knowledge in LLMs. However, it remains unclear whether these methods can effectively update deprecated API knowledge and enable edited models to generate up-to-date APIs. To address this gap, we conduct the first systematic study applying 10 state-of-the-art model editing techniques to update deprecated API knowledge in three LLMs: Qwen2.5-Coder, StarCoder2, and DeepSeek-Coder. We introduce EDAPIBench, a dedicated benchmark featuring over 70 deprecated APIs from 8 popular Python libraries, with more than 3,000 editing instances. Our results show that the parameter-efficient fine-tuning method AdaLoRA achieves the best performance in enabling edited models to generate correct, up-to-date APIs, but falls short in Specificity (i.e., the editing influences untargeted knowledge). To resolve this, we propose AdaLoRA-L, which defines "Common API Layers" (layers within the LLMs with high importance across all APIs, storing general knowledge and excluded from editing) and restricts edits exclusively to "Specific API Layers" (layers with high importance only for the target API, storing the API-specific knowledge). Experimental results demonstrate that AdaLoRA-L significantly improves Specificity while maintaining comparable performance across other evaluation metrics.

Paper Structure

This paper contains 24 sections, 21 figures, 6 tables.

Figures (21)

  • Figure 1: The construction process of EDAPIBench.
  • Figure 3: The distribution of the number of target APIs from different libraries in EDAPIBench.
  • Figure 4: The average time cost (seconds) of the model editing methods per edit.
  • Figure 5: The average peak memory cost (GB) of the model editing methods per edit.
  • Figure 6: The identification process of Specific API Layers and Common API Layers.
  • ...and 16 more figures