Improving the Robustness of Large Language Models for Code Tasks via Fine-tuning with Perturbed Data
Yang Liu, Armstrong Foundjem, Xingfang Wu, Heng Li, Foutse Khomh
TL;DR
This study tackles the robustness of LLMs for code tasks under input perturbations by formalizing a black-box threat model and implementing robustness testing across character-, word-, and sentence-level perturbations. It investigates robustness-focused fine-tuning using SafeCoder instruction tuning, training on 33 variants per base model, and evaluating on both perturbed and unperturbed test sets with Pass@1 and Relative Degradation as core metrics. The results show that perturbation-aware fine-tuning significantly improves robustness (RD reductions from around 78% to the single-digit percent range for many models) at the cost of a modest drop in clean Pass@1 (roughly 1–3 percentage points on average), with character- and mixed-perturbation strategies delivering the strongest gains. These findings offer practical design guidance for deploying robust LLM4Code systems, emphasizing a balanced mix of perturbations and moderate data scaling to maximize resilience without sacrificing too much performance.
Abstract
Context: In the fast-paced evolution of software development, Large Language Models (LLMs) have become indispensable tools for tasks such as code generation, completion, analysis, and bug fixing. Ensuring the robustness of these models against potential vulnerabilities from handling diverse inputs is critical, as variations in input can lead to incorrect or insecure code outputs. Objective: This work aims to improve the robustness of LLMs for coding-related tasks against potential adversarial inputs. Specifically, we investigate how fine-tuning LLMs with perturbed datasets impacts their robustness against input perturbations. Method: We systematically evaluated LLM robustness by fine-tuning models using datasets perturbed at character-level, word-level, and sentence-level, comparing results against base models and models fine-tuned on unperturbed datasets. Results: Fine-tuning LLMs with perturbed datasets significantly improves model robustness (RD usually drops around 4\% - 6\%), especially for models with relatively weak robustness. However, this fine-tuning process typically results in a slight performance decrease (pass@1 usually drops around 1\% - 3\%) compared to fine-tuning with unperturbed datasets, although occasional performance improvements are observed. Conclusion \& Implications: Fine-tuning LLMs for coding tasks with perturbed data effectively enhances their robustness at the cost of a minor performance reduction, emphasizing the importance of balancing the robustness and performance of LLMs for coding applications.
