Code Comments for Quantum Software Development Kits: An Empirical Study on Qiskit
Zenghui Zhou, Yuechen Li, Yi Cai, Jinlong Wen, Xiaohan Yu, Zheng Zheng, Beibei Yin
TL;DR
This paper introduces CC4Q, the first large-scale dataset of code comments for quantum software development kits, derived from Qiskit and annotated with both a classical developer-intent taxonomy and a novel quantum-specific taxonomy. It conducts a comprehensive empirical study across three perspectives—structure-based, developer-intent, and quantum-specific—to reveal how quantum concepts are expressed in code comments and how these differ from classical software. Key findings include a dominance of function-level documentation, substantial quantum-domain content in docstrings, and nuanced patterns in developer-intent unique to quantum SDKs, such as increased how-to-use guidance and the prevalence of diagrams, formulas, and references in developer-others comments. The work provides actionable guidelines for writing high-quality quantum comments and establishes a foundation for future automatic analysis and generation of quantum software documentation.
Abstract
Quantum computing is gaining attention from academia and industry. With the quantum Software Development Kits (SDKs), programmers can develop quantum software to explore the power of quantum computing. However, programmers may face challenges in understanding quantum software due to the non-intuitive quantum mechanics. To facilitate software development and maintenance, code comments offered in quantum SDKs serve as a natural language explanation of program functionalities and logical flows. Despite their importance, scarce research systematically reports their value and provides constructive guidelines for programmers. To address this gap, our paper focuses on Qiskit, one of the most popular quantum SDKs, and presents CC4Q, the first dataset of code comments for quantum computing. CC4Q incorporates 9677 code comment pairs and 21970 sentence-level code comment units, the latter of which involve heavy human annotation. Regarding the annotation, we validate the applicability of the developer-intent taxonomy used in classical programs, and also propose a new taxonomy considering quantum-specific knowledge. We conduct an empirical study comprehensively interpreting code comments from three perspectives: comment structure and coverage, developers' intentions, and associated quantum topics. Our findings uncover key differences in code comments between classical and quantum software, and also outline quantum-specific knowledge relevant to quantum software development.
