Robustness, Security, Privacy, Explainability, Efficiency, and Usability of Large Language Models for Code
Zhou Yang, Zhensu Sun, Terry Zhuo Yue, Premkumar Devanbu, David Lo
TL;DR
This systematic review identifies seven non-accuracy properties—robustness, security, privacy, explainability, efficiency, and usability—across 146 studies on LLMs for code, detailing how these properties are evaluated and enhanced. It maps current methodologies, highlights gaps (notably in scalable defense, generation-task explainability, and usability across diverse tasks), and proposes data-, human-, and system-centric perspectives to guide future research. The work offers a structured blueprint for improving trustworthy, efficient, and usable code-focused language models and underpins directions for benchmarks and practical tooling. Together, these insights support safer deployment and broader adoption of LLM4Code in real-world software engineering workflows.
Abstract
Large language models for code (LLM4Code), which demonstrate strong performance (e.g., high accuracy) in processing source code, have significantly transformed software engineering. Many studies separately investigate the non-functional properties of LM4Code, but there is no systematic review of how these properties are evaluated and enhanced. This paper fills this gap by thoroughly examining 146 relevant studies, thereby presenting the first systematic literature review to identify seven important properties beyond accuracy, including robustness, security, privacy, explainability, efficiency, and usability. We discuss the current state-of-the-art methods and trends, identify gaps in existing research, and present promising directions for future study.
