On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code

Md Imran Hossen; Xiali Hei

On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code

Md Imran Hossen, Xiali Hei

TL;DR

The paper introduces DegradePrompter, a robustness evaluation framework for instruction-tuned Code LLMs that generates adversarial prompts via a black-box oracle LLM to test functional correctness. It benchmarks eight models (five open-source and three commercial) on HumanEval and MBPP using metrics like $pass@1$, $CDRA$, and $ANR$, revealing substantial robustness gaps, especially among open-source models (roughly $12$–$34\%$ degradation) compared with commercial models (roughly $3$–$24\%$). A guided prompting defense is proposed to mitigate adversarial effects, showing mixed but often meaningful improvements in $ANR$, with performance depending on model type and dataset. Overall, the work highlights the importance of robust model design and comprehensive evaluation for dependable automated code generation systems, and points to future work in language- and task-agnostic defenses, broader programming-language coverage, and more sophisticated attack/defense scenarios.

Abstract

The advent of instruction-tuned Large Language Models designed for coding tasks (Code LLMs) has transformed software engineering practices. However, their robustness against various input challenges remains a critical concern. This study introduces DegradePrompter, a novel method designed to systematically evaluate the robustness of instruction-tuned Code LLMs. We assess the impact of diverse input challenges on the functionality and correctness of generated code using rigorous metrics and established benchmarks. Our comprehensive evaluation includes five state-of-the-art open-source models and three production-grade closed-source models, revealing varying degrees of robustness. Open-source models demonstrate an increased susceptibility to input perturbations, resulting in declines in functional correctness ranging from 12% to 34%. In contrast, commercial models demonstrate relatively greater resilience, with performance degradation ranging from 3% to 24%. To enhance the robustness of the models against these vulnerabilities, we investigate a straightforward yet effective mitigation strategy. Our findings highlight the need for robust defense mechanisms and comprehensive evaluations during both the development and deployment phases to ensure the resilience and reliability of automated code generation systems.

On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code

TL;DR

Abstract

On the Adversarial Robustness of Instruction-Tuned Large Language Models for Code

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (2)