Multi-line AI-assisted Code Authoring

Omer Dunay; Daniel Cheng; Adam Tait; Parth Thakkar; Peter C Rigby; Andy Chiu; Imad Ahmad; Arun Ganesan; Chandra Maddila; Vijayaraghavan Murali; Ali Tayyebi; Nachiappan Nagappan

Multi-line AI-assisted Code Authoring

Omer Dunay, Daniel Cheng, Adam Tait, Parth Thakkar, Peter C Rigby, Andy Chiu, Imad Ahmad, Arun Ganesan, Chandra Maddila, Vijayaraghavan Murali, Ali Tayyebi, Nachiappan Nagappan

TL;DR

This paper addresses the challenge of adding multi-line AI-assisted code suggestions to CodeCompose at Meta, balancing usefulness with non-intrusiveness. It introduces a scope-based multi-line algorithm to minimize jarring effects, coupled with both client- and server-side latency reductions (including UI indicators, Flash Attention, CUDA graphs, and streaming) to improve display rates and keystroke savings. Through large-scale A/B deployments involving tens of thousands of engineers, the study demonstrates that multi-line suggestions account for a disproportionately large share of accepted characters (42%) and keystroke savings (from 9% to 17%), with opt-out rates below 1%. The work contributes a concrete production blueprint for deploying long, context-rich completions in an enterprise setting, including measurable improvements in throughput and user experience. Overall, the findings support the viability and value of multi-line AI-assisted code authoring for large-scale software development teams.

Abstract

CodeCompose is an AI-assisted code authoring tool powered by large language models (LLMs) that provides inline suggestions to 10's of thousands of developers at Meta. In this paper, we present how we scaled the product from displaying single-line suggestions to multi-line suggestions. This evolution required us to overcome several unique challenges in improving the usability of these suggestions for developers. First, we discuss how multi-line suggestions can have a 'jarring' effect, as the LLM's suggestions constantly move around the developer's existing code, which would otherwise result in decreased productivity and satisfaction. Second, multi-line suggestions take significantly longer to generate; hence we present several innovative investments we made to reduce the perceived latency for users. These model-hosting optimizations sped up multi-line suggestion latency by 2.5x. Finally, we conduct experiments on 10's of thousands of engineers to understand how multi-line suggestions impact the user experience and contrast this with single-line suggestions. Our experiments reveal that (i) multi-line suggestions account for 42% of total characters accepted (despite only accounting for 16% for displayed suggestions) (ii) multi-line suggestions almost doubled the percentage of keystrokes saved for users from 9% to 17%. Multi-line CodeCompose has been rolled out to all engineers at Meta, and less than 1% of engineers have opted out of multi-line suggestions.

Multi-line AI-assisted Code Authoring

TL;DR

Abstract

Paper Structure (24 sections, 8 figures, 1 table)

This paper contains 24 sections, 8 figures, 1 table.

Introduction
Background and Methodology
Meta
CodeCompose at Meta
Measures for Evaluating CodeCompose
Addressing Challenge 1
Definition of the "jarring effect"
Approach to address the "Jarring Effect"
Technical Implementation of the Strategy
Addressing Challenge 2
Improvements in the editor client extension (i.e. VSCode / Bentobento Notebooks)
Optimizations to the model-hosting service
Addressing Challenge 3
Experiments for Release of Multi-line Suggestions
User Feedback and Opt-out Rate
...and 9 more sections

Figures (8)

Figure 1: Single-line "jarring" effect example: The user cursor positioned between "def" keyword and the "quicksort" function, inline suggestion appears and moves the existing user code to the right.
Figure 2: Example showing multi-line "jarring" effect: the user cursor was between a function name and the next line containing the statement "test1 = 1". When the suggestion occurs, the existing line is pushed down disrupting the developer's flow and forcing them review the suggested "quicksort" function while also determining the correct location of their existing code.
Figure 3: Examples showing pre-processing stage: Deciding based on the cursor position which type of suggestion should be displayed.
Figure 4: System architecture of CodeCompose: Client editor that surface the suggestions, a language server to mediate requests with CodeCompose model service host. In the request "multi-line" flag is passed to the model service.
Figure 5: Example showing the post-processing stage: The cursor is in the scope of the "foo" function. Although, the model returns a multi-line suggestion of both the "foo" and "foo2" functions, postprocessing will remove the code in the red box, and will only display suggestions for the in-scope "foo" function to the user.
...and 3 more figures

Multi-line AI-assisted Code Authoring

TL;DR

Abstract

Multi-line AI-assisted Code Authoring

Authors

TL;DR

Abstract

Table of Contents

Figures (8)