Table of Contents
Fetching ...

KetGPT - Dataset Augmentation of Quantum Circuits using Transformers

Boran Apak, Medina Bandic, Aritra Sarkar, Sebastian Feld

TL;DR

KetGPT addresses the shortage of realistic quantum circuit benchmarks by generating synthetic yet algorithm-like QASM circuits with a transformer-based decoder. A custom QASM tokenizer ensures syntactic validity, while a separate DistilBERT-based classifier evaluates realism, achieving high accuracy in distinguishing real versus random circuits. Structural analysis shows KetGPT circuits cluster with real circuits across multiple metrics, supporting their utility as benchmarks. The approach promises substantial impact for AI-driven quantum compilers and system benchmarking by expanding representative circuit datasets and enabling scalable optimization research.

Abstract

Quantum algorithms, represented as quantum circuits, can be used as benchmarks for assessing the performance of quantum systems. Existing datasets, widely utilized in the field, suffer from limitations in size and versatility, leading researchers to employ randomly generated circuits. Random circuits are, however, not representative benchmarks as they lack the inherent properties of real quantum algorithms for which the quantum systems are manufactured. This shortage of `useful' quantum benchmarks poses a challenge to advancing the development and comparison of quantum compilers and hardware. This research aims to enhance the existing quantum circuit datasets by generating what we refer to as `realistic-looking' circuits by employing the Transformer machine learning architecture. For this purpose, we introduce KetGPT, a tool that generates synthetic circuits in OpenQASM language, whose structure is based on quantum circuits derived from existing quantum algorithms and follows the typical patterns of human-written algorithm-based code (e.g., order of gates and qubits). Our three-fold verification process, involving manual inspection and Qiskit framework execution, transformer-based classification, and structural analysis, demonstrates the efficacy of KetGPT in producing large amounts of additional circuits that closely align with algorithm-based structures. Beyond benchmarking, we envision KetGPT contributing substantially to AI-driven quantum compilers and systems.

KetGPT - Dataset Augmentation of Quantum Circuits using Transformers

TL;DR

KetGPT addresses the shortage of realistic quantum circuit benchmarks by generating synthetic yet algorithm-like QASM circuits with a transformer-based decoder. A custom QASM tokenizer ensures syntactic validity, while a separate DistilBERT-based classifier evaluates realism, achieving high accuracy in distinguishing real versus random circuits. Structural analysis shows KetGPT circuits cluster with real circuits across multiple metrics, supporting their utility as benchmarks. The approach promises substantial impact for AI-driven quantum compilers and system benchmarking by expanding representative circuit datasets and enabling scalable optimization research.

Abstract

Quantum algorithms, represented as quantum circuits, can be used as benchmarks for assessing the performance of quantum systems. Existing datasets, widely utilized in the field, suffer from limitations in size and versatility, leading researchers to employ randomly generated circuits. Random circuits are, however, not representative benchmarks as they lack the inherent properties of real quantum algorithms for which the quantum systems are manufactured. This shortage of `useful' quantum benchmarks poses a challenge to advancing the development and comparison of quantum compilers and hardware. This research aims to enhance the existing quantum circuit datasets by generating what we refer to as `realistic-looking' circuits by employing the Transformer machine learning architecture. For this purpose, we introduce KetGPT, a tool that generates synthetic circuits in OpenQASM language, whose structure is based on quantum circuits derived from existing quantum algorithms and follows the typical patterns of human-written algorithm-based code (e.g., order of gates and qubits). Our three-fold verification process, involving manual inspection and Qiskit framework execution, transformer-based classification, and structural analysis, demonstrates the efficacy of KetGPT in producing large amounts of additional circuits that closely align with algorithm-based structures. Beyond benchmarking, we envision KetGPT contributing substantially to AI-driven quantum compilers and systems.
Paper Structure (17 sections, 1 equation, 6 figures, 4 tables)

This paper contains 17 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Tokenization Example. A sequence of QASM operations (in text file form) is provided as input, and each statement (a line of QASM code) is assigned to a number. The number assigned to each statement does not have an intuitive meaning; rather, it just depends on how the tokenization algorithm orders its vocabulary. Consequently, tokenizing a sequence of statements will create a list of numbers. It is important to note that both gate and qubit(s), we apply the gate on, matter for the assigned token. For instance, h q[0]; and h q[1]; would have different numbers assigned as shown.
  • Figure 1: Generator model settings
  • Figure 2: KetGPT Workflow: Firstly, a given text prompt is tokenized. These tokens are fed into the KetGPT model, which was trained with quantum circuits from an existing quantum circuit database. KetGPT then generates text to continue the given prompt, yielding a synthetic circuit. A separate transformer classifier model, trained to distinguish real from random quantum circuits, tests if the generated circuit is realistic. If the test is positive, it can be used to augment the quantum circuit database.
  • Figure 3: Side-by-side comparison between the lines of a 6 qubit QASM file generated by KetGPT (a), algorithm-based circuit (b) and a random circuit(c).
  • Figure 4: Classifier performance on a test dataset illustrated by a confusion matrix. Diagonal values of the matrix are correctly predicted: only 5 QASM files that are actually "Real" are predicted as being "Random", and 1 QASM file that is 'Random' is predicted as being 'Real'.
  • ...and 1 more figures