Architecture for a Trustworthy Quantum Chatbot
Yaiza Aragonés-Soria, Manuel Oriol
TL;DR
This work introduces C4Q 2.0, an architecture for a trustworthy quantum chatbot that couples classification and QA LLMs with a deterministic engine to deliver reliable quantum programming assistance. It adds ready-to-run Qiskit code, supports quantum-augmented solutions to TSP and KP, and includes a user feedback mechanism to drive iterative improvements while maintaining auditability. Empirical evaluations show near-perfect classification accuracy and robust but imperfect QA performance, with a head-to-head comparison indicating superior maintainability and correctness compared to three baseline chatbots, across different Qiskit versions. The approach demonstrates how modular design and template-based code generation can address trust, explainability, and maintainability challenges in specialized AI tools for quantum software engineering, with practical impact on education and development in the field.
Abstract
Large language model (LLM)-based tools such as ChatGPT seem useful for classical programming assignments. The more specialized the field, the more likely they lack reliability because of the lack of data to train them. In the case of quantum computing, the quality of answers of generic chatbots is low. C4Q is a chatbot focused on quantum programs that addresses this challenge through a software architecture that integrates specialized LLMs to classify requests and specialized question answering modules with a deterministic logical engine to provide trustworthy quantum computing support. This article describes the latest version (2.0) of C4Q, which delivers several enhancements: ready-to-run Qiskit code for gate definitions and circuit operations, expanded features to solve software engineering tasks such as the travelling salesperson problem and the knapsack problem, and a feedback mechanism for iterative improvement. Extensive testing of the backend confirms the system's reliability, while empirical evaluations show that C4Q 2.0's classification LLM reaches near-perfect accuracy. The evaluation of the result consists in a comparative study with three existing chatbots highlighting C4Q 2.0's maintainability and correctness, reflecting on how software architecture decisions, such as separating deterministic logic from probabilistic text generation impact the quality of the results.
