Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot

Kei Koyanagi; Dong Wang; Kotaro Noguchi; Masanari Kondo; Alexander Serebrenik; Yasutaka Kamei; Naoyasu Ubayashi

Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot

Kei Koyanagi, Dong Wang, Kotaro Noguchi, Masanari Kondo, Alexander Serebrenik, Yasutaka Kamei, Naoyasu Ubayashi

TL;DR

The study investigates how natural language input biases affect GitHub Copilot's code suggestions by evaluating prompts translated into English, Japanese, and Chinese. Using 756 AtCoder questions from 189 contests, the authors measure correctness via AtCoder test cases and analyze results with one-way ANOVA, finding language-dependent performance (Japanese > English > Chinese) and a consistent drop in accuracy as problem difficulty increases. This work highlights the importance of considering natural-language bias in AI-assisted programming and provides a foundation for multilingual evaluation and future improvements in Copilot and similar tools. The findings suggest practical implications for developers and researchers to account for language effects when using code-synthesis models across diverse user populations.

Abstract

GitHub Copilot is an AI-enabled tool that automates program synthesis. It has gained significant attention since its launch in 2021. Recent studies have extensively examined Copilot's capabilities in various programming tasks, as well as its security issues. However, little is known about the effect of different natural languages on code suggestion. Natural language is considered a social bias in the field of NLP, and this bias could impact the diversity of software engineering. To address this gap, we conducted an empirical study to investigate the effect of three popular natural languages (English, Japanese, and Chinese) on Copilot. We used 756 questions of varying difficulty levels from AtCoder contests for evaluation purposes. The results highlight that the capability varies across natural languages, with Chinese achieving the worst performance. Furthermore, regardless of the type of natural language, the performance decreases significantly as the difficulty of questions increases. Our work represents the initial step in comprehending the significance of natural languages in Copilot's capability and introduces promising opportunities for future endeavors.

Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot

TL;DR

Abstract

Exploring the Effect of Multiple Natural Languages on Code Suggestion Using GitHub Copilot

Authors

TL;DR

Abstract

Table of Contents