Do Code LLMs Understand Design Patterns?
Zhenyu Pan, Xuefeng Song, Yunkun Wang, Rongyu Cao, Binhua Li, Yongbin Li, Han Liu
TL;DR
The paper investigates whether Code LLMs understand and adhere to software design patterns, addressing biases that can affect downstream tasks. It introduces a three-task evaluation (design pattern classification, line completion, function generation) across Python and Java with 12 patterns, using 48 GitHub repositories. Findings show substantial room for improvement: even top models achieve around 38.8% accuracy in pattern classification and exhibit systematic misclassification of Singleton/Factory while Facade remains challenging, with nuanced CS vs ES relationships in generation tasks. The work highlights practical implications for reliability and developer workload and suggests richer training data and expanded evaluation to better instill design-pattern conformity in Code LLMs.
Abstract
Code Large Language Models (LLMs) demonstrate great versatility in adapting to various downstream tasks, including code generation and completion, as well as bug detection and fixing. However, Code LLMs often fail to capture existing coding standards, leading to the generation of code that conflicts with the required design patterns for a given project. As a result, developers must post-process to adapt the generated code to the project's design norms. In this work, we empirically investigate the biases of Code LLMs in software development. Through carefully designed experiments, we assess the models' understanding of design patterns across recognition, comprehension, and generation. Our findings reveal that biases in Code LLMs significantly affect the reliability of downstream tasks.
