Full Line Code Completion: Bringing AI to Desktop

Anton Semenkin; Vitaliy Bibaev; Yaroslav Sokolov; Kirill Krylov; Alexey Kalina; Anna Khannanova; Danila Savenkov; Darya Rovdo; Igor Davidenko; Kirill Karnaukhov; Maxim Vakhrushev; Mikhail Kostyukov; Mikhail Podvitskii; Petr Surkov; Yaroslav Golubev; Nikita Povarov; Timofey Bryksin

Full Line Code Completion: Bringing AI to Desktop

Anton Semenkin, Vitaliy Bibaev, Yaroslav Sokolov, Kirill Krylov, Alexey Kalina, Anna Khannanova, Danila Savenkov, Darya Rovdo, Igor Davidenko, Kirill Karnaukhov, Maxim Vakhrushev, Mikhail Kostyukov, Mikhail Podvitskii, Petr Surkov, Yaroslav Golubev, Nikita Povarov, Timofey Bryksin

TL;DR

This work tackles the challenge of delivering fast, privacy-preserving multi-token code completion directly on end-user machines. It presents Full Line Code Completion (FLCC), a local, Transformer-based approach using a 100M-parameter LLaMA-like model, a modified tokenization scheme, and a dynamic beam-search algorithm with token healing to generate full syntactically correct lines. The solution is integrated as a Kotlin-based IntelliJ plugin with a native C++ inference server and is evaluated through offline tests and industry-grade online A/B experiments, showing a 1.3x increase in Python code completion and competitive latency around 150 ms. The study demonstrates practical deployment of research-based AI in production IDEs, discusses integration challenges, and outlines future work across languages and UX refinements.

Abstract

In recent years, several industrial solutions for the problem of multi-token code completion appeared, each making a great advance in the area but mostly focusing on cloud-based runtime and avoiding working on the end user's device. In this work, we describe our approach for building a multi-token code completion feature for the JetBrains' IntelliJ Platform, which we call Full Line Code Completion. The feature suggests only syntactically correct code and works fully locally, i.e., data querying and the generation of suggestions happens on the end user's machine. We share important time and memory-consumption restrictions, as well as design principles that a code completion engine should satisfy. Working entirely on the end user's device, our code completion engine enriches user experience while being not only fast and compact but also secure. We share a number of useful techniques to meet the stated development constraints and also describe offline and online evaluation pipelines that allowed us to make better decisions. Our online evaluation shows that the usage of the tool leads to 1.3 times more Python code in the IDE being produced by code completion. The described solution was initially started with a help of researchers and was then bundled into all JetBrains IDEs where it is now used by millions of users. Thus, we believe that this work is useful for bridging academia and industry, providing researchers with the knowledge of what happens when complex research-based solutions are integrated into real products.

Full Line Code Completion: Bringing AI to Desktop

TL;DR

Abstract

Full Line Code Completion: Bringing AI to Desktop

Authors

TL;DR

Abstract

Table of Contents

Figures (4)