Insights from the Usage of the Ansible Lightspeed Code Completion Service

Priyam Sahoo; Saurabh Pujar; Ganesh Nalawade; Richard Gebhardt; Louis Mandel; Luca Buratti

Insights from the Usage of the Ansible Lightspeed Code Completion Service

Priyam Sahoo, Saurabh Pujar, Ganesh Nalawade, Richard Gebhardt, Louis Mandel, Luca Buratti

TL;DR

This paper presents the design and implementation of Ansible Lightspeed, an LLM-based service designed explicitly to generate Ansible YAML, given natural language prompt, and provides insights into the effectiveness of small, dedicated models in a domain-specific context.

Abstract

The availability of Large Language Models (LLMs) which can generate code, has made it possible to create tools that improve developer productivity. Integrated development environments or IDEs which developers use to write software are often used as an interface to interact with LLMs. Although many such tools have been released, almost all of them focus on general-purpose programming languages. Domain-specific languages, such as those crucial for Information Technology (IT) automation, have not received much attention. Ansible is one such YAML-based IT automation-specific language. Ansible Lightspeed is an LLM-based service designed explicitly to generate Ansible YAML, given natural language prompt. In this paper, we present the design and implementation of the Ansible Lightspeed service. We then evaluate its utility to developers using diverse indicators, including extended utilization, analysis of user edited suggestions, as well as user sentiments analysis. The evaluation is based on data collected for 10,696 real users including 3,910 returning users. The code for Ansible Lightspeed service and the analysis framework is made available for others to use. To our knowledge, our study is the first to involve thousands of users of code assistants for domain-specific languages. We are also the first code completion tool to present N-Day user retention figures, which is 13.66% on Day 30. We propose an improved version of user acceptance rate, called Strong Acceptance rate, where a suggestion is considered accepted only if less than 50% of it is edited and these edits do not change critical parts of the suggestion. By focusing on Ansible, Lightspeed is able to achieve a strong acceptance rate of 49.08% for multi-line Ansible task suggestions. With our findings we provide insights into the effectiveness of small, dedicated models in a domain-specific context.

Insights from the Usage of the Ansible Lightspeed Code Completion Service

TL;DR

Abstract

Paper Structure (25 sections, 11 figures, 2 tables)

This paper contains 25 sections, 11 figures, 2 tables.

Introduction
Ansible Lightspeed
Code Editor Extension
Inference Pipeline
Pre-processing
Inference
Post-processing
Content Matching Pipeline
Data Collection and Analysis Framework
Usage analyser
Suggestion analyzer
Analysis of User Interactions
Temporal Analysis
User Retention
Edit Analysis
...and 10 more sections

Figures (11)

Figure 1: A typical Ansible playbook structure consists of playbook definition and tasks list. Ansible Lightspeed generates a task, given the task name and the preceding context. Module name is the first term generated by Lightspeed, followed by the module associated keys and values.
Figure 2: Ansible Lightspeed's workflow in the text editor. Users receive Lightspeed suggestions, which are almost always multi-line, after entering the task name and moving to the next line. Then, users can either accept the suggestion by pressing the ‘Tab’ key or reject it by pressing the ‘Esc’ key.
Figure 3: Ansible Lightspeed application architecture. The components are: A the user interface, B the inference pipeline, C the content matching pipeline, D the analysis framework and E supporting services. The arrows indicate the information flow and the numbers indicate the processing order.
Figure 4: Ansible Lightspeed Feedback User Interface. The users can rate their experience using emojis, which we call star-rating. The right most emoji would indicate the best experience or a 5-star rating, and the left most emoji would indicate the worst experience or a 1-star rating. Users are also expected to write a few words to explain their rating in "Tell us why?" section.
Figure 5: Architecture of the suggestion analyser.
...and 6 more figures

Insights from the Usage of the Ansible Lightspeed Code Completion Service

TL;DR

Abstract

Insights from the Usage of the Ansible Lightspeed Code Completion Service

Authors

TL;DR

Abstract

Table of Contents

Figures (11)