Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models

Yanlin Wang; Tianyue Jiang; Mingwei Liu; Jiachi Chen; Mingzhi Mao; Xilin Liu; Yuchi Ma; Zibin Zheng

Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models

Yanlin Wang, Tianyue Jiang, Mingwei Liu, Jiachi Chen, Mingzhi Mao, Xilin Liu, Yuchi Ma, Zibin Zheng

TL;DR

This paper pioneers the empirical study of coding style differences between mainstream Code LLMs and human developers. It builds a 24-type taxonomy of coding-style inconsistencies across five dimensions and analyzes 1,179 Python samples from five LLMs against ground-truth tasks in CoderEval, assessing readability, conciseness, and robustness. Findings show significant style gaps, especially in statements/expressions and formatting, with LLMs largely resembling human code in overall quality but differing in API usage and formatting choices. Prompting techniques offer limited, sometimes trade-off-laden improvements, indicating that aligning coding style requires approaches beyond simple prompts.

Abstract

Large language models (LLMs) have brought a paradigm shift to the field of code generation, offering the potential to enhance the software development process. However, previous research mainly focuses on the accuracy of code generation, while coding style differences between LLMs and human developers remain under-explored. In this paper, we empirically analyze the differences in coding style between the code generated by mainstream Code LLMs and the code written by human developers, and summarize coding style inconsistency taxonomy. Specifically, we first summarize the types of coding style inconsistencies by manually analyzing a large number of generation results. We then compare the code generated by Code LLMs with the code written by human programmers in terms of readability, conciseness, and robustness. The results reveal that LLMs and developers have different coding styles. Additionally, we study the possible causes of these inconsistencies and provide some solutions to alleviate the problem.

Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models

TL;DR

Abstract

Paper Structure (24 sections, 14 figures, 1 table)

This paper contains 24 sections, 14 figures, 1 table.

Introduction
Related Work
LLM-based Code Generation
Coding Style
Experimental Setup
LLM Selection
Benchmark Selection
Implementation Details
Evaluation
RQ1: Coding Style Inconsistency Identification
Data Collection
Data Annotation
Taxonomy
RQ2: Coding Style Inconsistency Analysis
Percentages of Inconsistent Coding Styles
...and 9 more sections

Figures (14)

Figure 1: An Example of Style-Consistent Implementation.
Figure 2: An Example of Incorrect Implementation that Passed Unit Tests.
Figure 3: Dimensions and Corresponding Coding Style Inconsistency Types.
Figure 4: Percentages of Inconsistent Coding Styles.
Figure 5: Inconsistency Numbers in a Single Code Sample.
...and 9 more figures

Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models

TL;DR

Abstract

Beyond Functional Correctness: Investigating Coding Style Inconsistencies in Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (14)