Table of Contents
Fetching ...

Quantum Large Language Model Fine-Tuning

Sang Hyub Kim, Jonathan Mei, Claudio Girotto, Masako Yamada, Martin Roetteler

TL;DR

The paper investigates a hybrid quantum-classical approach to fine-tuning pre-trained sentence transformers for sentiment analysis, by replacing the final classification head with a differentiable quantum circuit and keeping the base LLM frozen. It systematically studies how performance scales with qubits, re-uploading, and encoder multiplicity under shot and gate noise, finding that accuracy tends to improve with more qubits and with multi-encoder designs, achieving up to about $92.71\%$ on SST2, roughly $3.14\%$ above strong classical baselines. The work provides ablation analyses showing the quantum head’s positive contribution, demonstrates robustness to noise via differentiable shot-sampling, and estimates energy trade-offs between QPU and GPU inference, with an empirical upper bound near $93.62\%$ when end-to-end fine-tuning is allowed. Overall, the results suggest that quantum-inspired and quantum-enhanced heads can meaningfully augment few-shot sentiment classification, motivate larger-scale studies, and guide hardware-aware architecture design for near-term quantum devices.

Abstract

We introduce a hybrid quantum-classical deep learning architecture for large language model fine-tuning. The classical portion of the architecture is a sentence transformer that is powerful enough to display significant accuracy for complex tasks such as sentiment prediction. The quantum portion of the architecture consists of parameterized quantum circuits that utilize long-range connections between qubits. We analyze the performance of the hybrid models for various settings of hyperparameters, including the number of qubits, the depth of the quantum circuits, learning rate, number of re-uploading steps, etc. Based on a screening study of main effects, we show an overall improvement in prediction accuracy over a comparable classical baseline, with a trend of increasing accuracy with number of qubits. We observe up to $3.14\%$ improvements in accuracy over classical architectures of comparable model size, within the set of hyperparameters probed in this study. We demonstrate the contribution of each module in our architecture through ablation studies. Our studies are based on finite shot-counts and include simulations based on noisy quantum gates.

Quantum Large Language Model Fine-Tuning

TL;DR

The paper investigates a hybrid quantum-classical approach to fine-tuning pre-trained sentence transformers for sentiment analysis, by replacing the final classification head with a differentiable quantum circuit and keeping the base LLM frozen. It systematically studies how performance scales with qubits, re-uploading, and encoder multiplicity under shot and gate noise, finding that accuracy tends to improve with more qubits and with multi-encoder designs, achieving up to about on SST2, roughly above strong classical baselines. The work provides ablation analyses showing the quantum head’s positive contribution, demonstrates robustness to noise via differentiable shot-sampling, and estimates energy trade-offs between QPU and GPU inference, with an empirical upper bound near when end-to-end fine-tuning is allowed. Overall, the results suggest that quantum-inspired and quantum-enhanced heads can meaningfully augment few-shot sentiment classification, motivate larger-scale studies, and guide hardware-aware architecture design for near-term quantum devices.

Abstract

We introduce a hybrid quantum-classical deep learning architecture for large language model fine-tuning. The classical portion of the architecture is a sentence transformer that is powerful enough to display significant accuracy for complex tasks such as sentiment prediction. The quantum portion of the architecture consists of parameterized quantum circuits that utilize long-range connections between qubits. We analyze the performance of the hybrid models for various settings of hyperparameters, including the number of qubits, the depth of the quantum circuits, learning rate, number of re-uploading steps, etc. Based on a screening study of main effects, we show an overall improvement in prediction accuracy over a comparable classical baseline, with a trend of increasing accuracy with number of qubits. We observe up to improvements in accuracy over classical architectures of comparable model size, within the set of hyperparameters probed in this study. We demonstrate the contribution of each module in our architecture through ablation studies. Our studies are based on finite shot-counts and include simulations based on noisy quantum gates.

Paper Structure

This paper contains 38 sections, 8 equations, 12 figures, 16 tables.

Figures (12)

  • Figure 1: Block diagram of the last layer added to the base LLM, consisting of two modules. The first module, implemented on classical devices, contains a variable number of $E$ parallel encoders which encode the embedding vectors from SetFit of dimension equal to $768$ into output vectors of dimension $Q_c$. In the case of a single encoder $E = 1$. The second module, implemented on QPU, contains a flexible data re-uploading module with a variable number $N$ of repetitions. The output of the end-to-end trainable module is a vector with dimension $Q_m$, the number of qubits measured, where $Q_m$ is fixed at 1 for all experiments.
  • Figure 2: The ansatz (left) consists of layers with increasing connectivity $C$ across the width of the circuit. Each block $U_i$ is defined as a combination of a controlled NOT and a single-qubit rotation $R_Y$ through angle $\theta_i$ around the Y-axis (right).
  • Figure 3: Example of a PQC with $Q=4$ qubits, 3 layers, and connectivity $C=2$.
  • Figure 4: The overall Architecture, which includes both embedding and transformer blocks, in orange, and is composed by the pretrained classical masked language model (left), and the quantum classification head (right).
  • Figure 5: Estimated energy consumption for inference on GPU increases significantly faster than on QPU where GPU inference uses statevector simulation. The cross-over point occurs at 46 qubits. The ansatz configurations chosen are: $R=4$, $M=2$, $N=1$,
  • ...and 7 more figures