Table of Contents
Fetching ...

QiboAgent: a practitioner's guideline to open source assistants for Quantum Computing code development

Lorenzo Esposito, Andrea Papaluca, Stefano Carrazza

Abstract

We introduce QiboAgent, a reference implementation designed to serve as a practitioner's guideline for developing specialized coding assistants in Quantum Computing middleware. Addressing the limitations in scientific software development of general-purpose proprietary models, we explore how lightweight, open-source Large Language Models (LLMs) provided with a custom workflow architecture compare. In detail, we experiment with two complementary paradigms: a Retrieval-Augmented Generation pipeline for high-precision information retrieval, and an autonomous agentic workflow for complex software engineering tasks. We observe that this hybrid approach significantly reduces hallucination rates in code generation compared to a proprietary baseline, achieving a peak accuracy of 90.2% with relatively small open-source models of size up to 30B parameters. Furthermore, the agentic framework exhibits advanced coding capabilities, automating the resolution of maintenance issues and new features requests, or by prototyping larger-scale refactors of the codebase, such as producing a compiled Rust module with bindings of an original pure python package, Qibo in our case. The LLM workflows used for our analysis are integrated into a user interface and a Model Context Protocol server, providing an accessible tool for Qibo developers.

QiboAgent: a practitioner's guideline to open source assistants for Quantum Computing code development

Abstract

We introduce QiboAgent, a reference implementation designed to serve as a practitioner's guideline for developing specialized coding assistants in Quantum Computing middleware. Addressing the limitations in scientific software development of general-purpose proprietary models, we explore how lightweight, open-source Large Language Models (LLMs) provided with a custom workflow architecture compare. In detail, we experiment with two complementary paradigms: a Retrieval-Augmented Generation pipeline for high-precision information retrieval, and an autonomous agentic workflow for complex software engineering tasks. We observe that this hybrid approach significantly reduces hallucination rates in code generation compared to a proprietary baseline, achieving a peak accuracy of 90.2% with relatively small open-source models of size up to 30B parameters. Furthermore, the agentic framework exhibits advanced coding capabilities, automating the resolution of maintenance issues and new features requests, or by prototyping larger-scale refactors of the codebase, such as producing a compiled Rust module with bindings of an original pure python package, Qibo in our case. The LLM workflows used for our analysis are integrated into a user interface and a Model Context Protocol server, providing an accessible tool for Qibo developers.
Paper Structure (23 sections, 2 equations, 9 figures, 1 table)

This paper contains 23 sections, 2 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Multi-role operational modalities of QiboAgent. The framework encompasses two distinct operational layers: (Left) a RAG-based assistance layer providing context-aware support for users and developers (e.g., interactive Q&A and automated documentation); (Right) an autonomous Agentic layer designed for repository maintenance, enabling proactive GitHub issue resolution and structural code refactoring.
  • Figure 2: RAG Pipeline Architecture. The process begins with the construction of a knowledge base filtering code files and documentation from the Github repository. During the ingestion phase, documents are segmented into chunks and embedded into a vector space. At inference time, the user's query is vectorized and used to retrieve relevant context, which is then injected into the LLM's prompt to condition generation.
  • Figure 3: Schematic representation of the agentic workflow for automated issue resolution. The diagram illustrates the iterative ReAct loop, where the agent utilizes a specialized toolset to analyze issue specifications, navigate the repository structure, and synthesize code patches.
  • Figure 4: Schematic representation of the multi-agent architecture for the development of the Qibo-core module. The diagram illustrates the sequential workflow, where each agent is responsible for a specific layer of the software stack, from Rust implementation to Python bindings and validation.
  • Figure 5: Code generation accuracy on the 50-question benchmark. We compare six LLMs across three retrieval configurations (No-RAG, Semantic Search, Hybrid Search) against the commercial Google Native UI baseline ($56.9\%$). The Hybrid Retrieval strategy consistently yields the highest performance, peaking at $90.2\%$ with qwen3-coder:30b. Notably, this setup allows lightweight open-source models ($20\text{--}30$B parameters) to significantly outperform the general-purpose commercial baseline, demonstrating the dominance of domain-specific context over raw model scale.
  • ...and 4 more figures