Multi-line AI-assisted Code Authoring
Omer Dunay, Daniel Cheng, Adam Tait, Parth Thakkar, Peter C Rigby, Andy Chiu, Imad Ahmad, Arun Ganesan, Chandra Maddila, Vijayaraghavan Murali, Ali Tayyebi, Nachiappan Nagappan
TL;DR
This paper addresses the challenge of adding multi-line AI-assisted code suggestions to CodeCompose at Meta, balancing usefulness with non-intrusiveness. It introduces a scope-based multi-line algorithm to minimize jarring effects, coupled with both client- and server-side latency reductions (including UI indicators, Flash Attention, CUDA graphs, and streaming) to improve display rates and keystroke savings. Through large-scale A/B deployments involving tens of thousands of engineers, the study demonstrates that multi-line suggestions account for a disproportionately large share of accepted characters (42%) and keystroke savings (from 9% to 17%), with opt-out rates below 1%. The work contributes a concrete production blueprint for deploying long, context-rich completions in an enterprise setting, including measurable improvements in throughput and user experience. Overall, the findings support the viability and value of multi-line AI-assisted code authoring for large-scale software development teams.
Abstract
CodeCompose is an AI-assisted code authoring tool powered by large language models (LLMs) that provides inline suggestions to 10's of thousands of developers at Meta. In this paper, we present how we scaled the product from displaying single-line suggestions to multi-line suggestions. This evolution required us to overcome several unique challenges in improving the usability of these suggestions for developers. First, we discuss how multi-line suggestions can have a 'jarring' effect, as the LLM's suggestions constantly move around the developer's existing code, which would otherwise result in decreased productivity and satisfaction. Second, multi-line suggestions take significantly longer to generate; hence we present several innovative investments we made to reduce the perceived latency for users. These model-hosting optimizations sped up multi-line suggestion latency by 2.5x. Finally, we conduct experiments on 10's of thousands of engineers to understand how multi-line suggestions impact the user experience and contrast this with single-line suggestions. Our experiments reveal that (i) multi-line suggestions account for 42% of total characters accepted (despite only accounting for 16% for displayed suggestions) (ii) multi-line suggestions almost doubled the percentage of keystrokes saved for users from 9% to 17%. Multi-line CodeCompose has been rolled out to all engineers at Meta, and less than 1% of engineers have opted out of multi-line suggestions.
