AI Code Generators for Security: Friend or Foe?
Roberto Natella, Pietro Liguori, Cristina Improta, Bojan Cukic, Domenico Cotroneo
TL;DR
The paper investigates the dual-use implications of AI code generators for security, proposing a security-focused evaluation benchmark to study synthetic attack generation. It introduces the violent-python dataset with 1,372 NL intent–Python code pairs across offensive security tasks and fine-tunes CodeBERT, comparing it to public code generators (GitHub Copilot, Amazon CodeWhisperer) using the edit-distance metric $ED\in [0,1]$. Results show that domain-specific fine-tuning substantially improves semantic alignment, especially for line-level code, while larger units (blocks, functions) yield smaller gains and public generators can be competitive when not fine-tuned. The work underscores the need for security-oriented corpora to enable robust benchmarking and mitigates misuse by outlining when and how AI code generators can aid defenders, suggesting practical guidance on tool selection depending on data availability and task granularity. Overall, the study provides a benchmark and insights that can guide future research and responsible deployment of AI code generators in security contexts, highlighting the balance between enabling defense and mitigating misuse.
Abstract
Recent advances of artificial intelligence (AI) code generators are opening new opportunities in software security research, including misuse by malicious actors. We review use cases for AI code generators for security and introduce an evaluation benchmark.
