QMill: Representative Quantum Data Generation for Quantum Machine Learning Utility

Jason Ludmir; Ian Martin; Nicholas S. DiBrita; Daniel Leeds; Tirthak Patel

QMill: Representative Quantum Data Generation for Quantum Machine Learning Utility

Jason Ludmir, Ian Martin, Nicholas S. DiBrita, Daniel Leeds, Tirthak Patel

TL;DR

QMill tackles the critical need for representative quantum data in QML by generating low-depth, entangled samples whose CE values follow user-defined distributions. It combines a library of lightweight ansatzes, dual-annealing optimization to match CE distributions via TVD, and SWAP-test-based diversity checks, producing scalable, entanglement-aware datasets. The framework is validated on both classical datasets mapped to quantum amplitudes and native quantum datasets, showing faithful CE distribution replication and resilience to noise; a three-qubit QNN trained on QMill data attains performance near a classical baseline. Open-source code and datasets are provided, offering a practical tool for benchmarking and advancing QML under realistic quantum-data conditions.

Abstract

Quantum machine learning (QML) promises significant speedups, particularly when operating on quantum datasets. However, its progress is hindered by the scarcity of suitable training data. Existing synthetic data generation methods fall short in capturing essential entanglement properties, limiting their utility for QML. To address this, we introduce QMill, a low-depth quantum data generation framework that produces entangled, high-quality samples emulating diverse classical and quantum distributions, enabling more effective development and evaluation of QML models in representative-data settings.

QMill: Representative Quantum Data Generation for Quantum Machine Learning Utility

TL;DR

Abstract

QMill: Representative Quantum Data Generation for Quantum Machine Learning Utility

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)