PennyCoder: Efficient Domain-Specific LLMs for PennyLane-Based Quantum Code Generation
Abdul Basit, Minghao Shao, Muhammad Haider Asif, Nouhaila Innan, Muhammad Kashif, Alberto Marchisio, Muhammad Shafique
TL;DR
The paper addresses the privacy, latency, and cost challenges of cloud-based quantum code assistants by introducing PennyCoder, a lightweight on-device framework for PennyLane-based quantum code generation. It combines domain-specific instruction tuning on the PennyLang dataset with parameter-efficient LoRA fine-tuning of a LLaMA 3.1-8B foundation model, augmented by an optional RAG module for long-tail tasks. Empirical results show PennyCoder achieving 44.32% accuracy on PennyLane tasks, outperforming both the base model and a RAG-augmented baseline, while maintaining local deployability. This work demonstrates a practical path toward private, on-device quantum programming assistants that can support QML and QRL workflows, and outlines future directions for data augmentation and hardware-aware enhancements.
Abstract
The growing demand for robust quantum programming frameworks has unveiled a critical limitation: current large language model (LLM) based quantum code assistants heavily rely on remote APIs, introducing challenges related to privacy, latency, and excessive usage costs. Addressing this gap, we propose PennyCoder, a novel lightweight framework for quantum code generation, explicitly designed for local and embedded deployment to enable on-device quantum programming assistance without external API dependence. PennyCoder leverages a fine-tuned version of the LLaMA 3.1-8B model, adapted through parameter-efficient Low-Rank Adaptation (LoRA) techniques combined with domain-specific instruction tuning optimized for the specialized syntax and computational logic of quantum programming in PennyLane, including tasks in quantum machine learning and quantum reinforcement learning. Unlike prior work focused on cloud-based quantum code generation, our approach emphasizes device-native operability while maintaining high model efficacy. We rigorously evaluated PennyCoder over a comprehensive quantum programming dataset, achieving 44.3% accuracy with our fine-tuned model (compared to 33.7% for the base LLaMA 3.1-8B and 40.1% for the RAG-augmented baseline), demonstrating a significant improvement in functional correctness.
