Context Composing for Full Line Code Completion
Anton Semenkin, Yaroslav Sokolov, Evgeniia Vu
TL;DR
This paper tackles full-line code completion on-device for IDEs, addressing privacy and latency constraints with compact, context-aware generation. It introduces context composing (prompt engineering) for GPT-/LLaMA-like autoregressive models, using indentation-aware tokens and long-token BPE to fit within a limited context window. In online A/B tests across PyCharm Pro, FLCC increased the ratio of code completed by about 1.5x without slowing the IDE and received positive user feedback, validating the approach while highlighting room for richer contextual signals. The authors plan to expand context lengths to 1536 tokens, explore retrieval-augmented techniques (RETRO-like) and other data-driven context enhancements, and pursue privacy-preserving data collection and collaboration to advance neural code completion for end-user devices.
Abstract
Code Completion is one of the most used Integrated Development Environment (IDE) features, which affects the everyday life of a software developer. Modern code completion approaches moved from the composition of several static analysis-based contributors to pipelines that involve neural networks. This change allows the proposal of longer code suggestions while maintaining the relatively short time spent on generation itself. At JetBrains, we put a lot of effort into perfecting the code completion workflow so it can be both helpful and non-distracting for a programmer. We managed to ship the Full Line Code Completion feature to PyCharm Pro IDE and proved its usefulness in A/B testing on hundreds of real Python users. The paper describes our approach to context composing for the Transformer model that is a core of the feature's implementation. In addition to that, we share our next steps to improve the feature and emphasize the importance of several research aspects in the area.
