Table of Contents
Fetching ...

Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code

Nicolas Dupuis, Luca Buratti, Sanjay Vishwakarma, Aitana Viudes Forrat, David Kremer, Ismael Faro, Ruchir Puri, Juan Cruz-Benito

TL;DR

The study addresses the challenge of generating reliable quantum code by training a domain-focused Code LLM on Granite-20b-code with extensive Qiskit data and instruction-tuning. It introduces the Qiskit Code Assistant and a Qiskit-specific benchmark (Qiskit HumanEval) to rigorously evaluate quantum-code generation. Results show that extend-pretraining yields a Qiskit HumanEval pass rate of approximately 46.5%, surpassing all baselines, while maintaining strong performance on standard coding benchmarks. The work demonstrates the practical potential of specialized Code LLMs for quantum computing and outlines public release plans and future enhancements to keep pace with evolving QC ecosystems.

Abstract

Code Large Language Models (Code LLMs) have emerged as powerful tools, revolutionizing the software development landscape by automating the coding process and reducing time and effort required to build applications. This paper focuses on training Code LLMs to specialize in the field of quantum computing. We begin by discussing the unique needs of quantum computing programming, which differ significantly from classical programming approaches or languages. A Code LLM specializing in quantum computing requires a foundational understanding of quantum computing and quantum information theory. However, the scarcity of available quantum code examples and the rapidly evolving field, which necessitates continuous dataset updates, present significant challenges. Moreover, we discuss our work on training Code LLMs to produce high-quality quantum code using the Qiskit library. This work includes an examination of the various aspects of the LLMs used for training and the specific training conditions, as well as the results obtained with our current models. To evaluate our models, we have developed a custom benchmark, similar to HumanEval, which includes a set of tests specifically designed for the field of quantum computing programming using Qiskit. Our findings indicate that our model outperforms existing state-of-the-art models in quantum computing tasks. We also provide examples of code suggestions, comparing our model to other relevant code LLMs. Finally, we introduce a discussion on the potential benefits of Code LLMs for quantum computing computational scientists, researchers, and practitioners. We also explore various features and future work that could be relevant in this context.

Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code

TL;DR

The study addresses the challenge of generating reliable quantum code by training a domain-focused Code LLM on Granite-20b-code with extensive Qiskit data and instruction-tuning. It introduces the Qiskit Code Assistant and a Qiskit-specific benchmark (Qiskit HumanEval) to rigorously evaluate quantum-code generation. Results show that extend-pretraining yields a Qiskit HumanEval pass rate of approximately 46.5%, surpassing all baselines, while maintaining strong performance on standard coding benchmarks. The work demonstrates the practical potential of specialized Code LLMs for quantum computing and outlines public release plans and future enhancements to keep pace with evolving QC ecosystems.

Abstract

Code Large Language Models (Code LLMs) have emerged as powerful tools, revolutionizing the software development landscape by automating the coding process and reducing time and effort required to build applications. This paper focuses on training Code LLMs to specialize in the field of quantum computing. We begin by discussing the unique needs of quantum computing programming, which differ significantly from classical programming approaches or languages. A Code LLM specializing in quantum computing requires a foundational understanding of quantum computing and quantum information theory. However, the scarcity of available quantum code examples and the rapidly evolving field, which necessitates continuous dataset updates, present significant challenges. Moreover, we discuss our work on training Code LLMs to produce high-quality quantum code using the Qiskit library. This work includes an examination of the various aspects of the LLMs used for training and the specific training conditions, as well as the results obtained with our current models. To evaluate our models, we have developed a custom benchmark, similar to HumanEval, which includes a set of tests specifically designed for the field of quantum computing programming using Qiskit. Our findings indicate that our model outperforms existing state-of-the-art models in quantum computing tasks. We also provide examples of code suggestions, comparing our model to other relevant code LLMs. Finally, we introduce a discussion on the potential benefits of Code LLMs for quantum computing computational scientists, researchers, and practitioners. We also explore various features and future work that could be relevant in this context.
Paper Structure (7 sections, 1 figure, 2 tables)

This paper contains 7 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Output samples generated with granite-20b-code-qk and deepseek-coder-33b-base. In (a) and (b) the models are prompted with an instruction set as a python comment, while in (c) and (d), the models are prompted with the import statements, a function header and a python docstring.